Levenshtein distance (idea) by Bantik

A concept named for Vladimir Levenshtein, who published the algorithm in the 1966 paper "Binary codes capable of correcting deletions, insertions and reversals".

Levenshtein distance is measure of the distance between two data sets in terms of data insertion, deletion, or substitution. Each edit activity that required to transform data set a into data set b is assigned a point value. Relative point values between different data sets can then be used to find data sets that are most similar.

For example, if you assign one point to each edit operation, then the Levenshtein distance between hermetic and memetics is 2: substitute "m" for "h", and delete "r".

If you have used a spell checker, then you've seen the Levenshtein algorithm at work. One of the most common applications is determining what you may have been trying to say when you typed "sharp eye for detal". Most decent spell checkers also take into consideration probabilities of certain common misspellings. Really sophisticated spell checkers use soundex databases in conjunction with Levenshtein computations to decipher phonetic spellings or simply really, really bad spelling errors.

Applications based on computing the Levenshtein distance between data sets are not limited to simple text operations. Other practical applications include speech recognition, handwriting recognition, plagiarism detection software, and even DNA analysis.

Levenshtein distance is not the same as Hamming distance; Hamming distance is only used to compare data sets of the same size.

hamming distance	Soundex	Smith Waterman algorithm	Needleman Wunsch algorithm
dynamic programming	Hausdorff distance	Levenshtein dreams	Logic Programming
memetics	distance	speech recognition	The C Programming Language
The Book of Books	January 3, 2012	edit distance	Bremermann Limit
Brennschluss	Computational biology	Lomo	Hermetic
Plagiarism	data set	DNA