Length of an exact sequence match, as start region for the final alignment
A BLAST search starts with finding a perfect sequence match of length given by -word_size. This initial region of an exact sequence match is then extended in both direction allowing gaps and substitutions based on the scoring thresholds.
Changing the initial word-size can help to find more, but less accurate hits; or to limit the results to almost perfect hits.
Decreasing the word-size will increase the number of detected homologous sequences, but hits can include alignments of higher fragmentation due to gaps and substitutions (example: search for homologous genes between distant species, see also: -task blastn)
Increasing the word-size will give less hits as it requires a longer continuous regions of exact match. If the word-size is chosen to be almost the size of the query, BLAST will search for almost exact matches (example: search for location of gene sequences in the original genome of the gene)
For short sequences, word-size must be less than half the query length, otherwise reliable hits can be missed.
nucleotide sequence search blastn with default megablast (bastn): -word_size 28
nucleotide sequence search blastn only (bastn -task blastn): -word_size 11
amino acid search (blastp): -word_size 3
Setting the word-size to a very low value (-word_size 5) makes a blastn search very slow.