In practice, the method of feasible string-search algorithm may be affected by the string encoding.
In particular, if a variable-width encoding is in use, then it may be slower to find the Nth character, perhaps requiring time proportional to N. This may significantly slow some search algorithms.
One of many possible solutions is to search for the sequence of code units instead, but doing so may produce false matches unless the encoding is specifically designed to avoid it.
For example, one might want to match the "needle" only where it consists of one (or more) complete words—perhaps defined as not having other letters immediately adjacent on either side.
In that case a search for "hew" or "low" should fail for the example sentence above, even though those literal strings do occur.
For many purposes, a search for a phrase such as "to be" should succeed even in places where there is something else intervening between the "to" and the "be": Many symbol systems include characters that are synonymous (at least for some purposes): Finally, for strings that represent natural language, aspects of the language itself become involved.
For example, to catch both the American English word "color" and the British equivalent "colour", instead of searching for two different literal strings, one might use a regular expression such as: where the "?"
A similar problem introduced in the field of bioinformatics and genomics is the maximal exact matching (MEM).
In the normal case, we only have to look at one or two characters for each wrong position to see that it is a wrong position, so in the average case, this takes O(n + m) steps, where n is the length of the haystack and m is the length of the needle; but in the worst case, searching for a string like "aaaab" in a string like "aaaaaaaaab", it takes O(nm) In this approach, backtracking is avoided by constructing a deterministic finite automaton (DFA) that recognizes a stored search string.
time under the assumption that the alphabet has a constant size and all inner nodes in the suffix tree know what leaves are underneath them.