String metric

For example, the strings "Sam" and "Samuel" can be considered to be close.

[2] It operates between two input strings, returning a number equivalent to the number of substitutions and deletions needed in order to transform one input string into another.

Simplistic string metrics such as Levenshtein distance have expanded to include phonetic, token, grammatical and character-based methods of statistical comparisons.

String metrics are used heavily in information integration and are currently used in areas including fraud detection, fingerprint analysis, plagiarism detection, ontology merging, DNA analysis, RNA analysis, image analysis, evidence-based machine learning, database data deduplication, data mining, incremental search, data integration, malware detection,[3] and semantic knowledge integration.

There also exist functions which measure a dissimilarity between strings, but do not necessarily fulfill the triangle inequality, and as such are not metrics in the mathematical sense.