One commonly used approach for measuring similarity between strings is the string edit distance. The basic premise of the string edit distance is to find the minimum number of character edit operations (copy, replace, add, and remove) needed to transform one string into the other. For example, the edit distance between parks and spark is two: one edit is needed to add a leading s to parks and a second edit is needed to delete the trailing s. Not suprisingly, string edit distance is used in spell-checking applications.

It is convenient to look at string edit distance as a string alignment problem. The example below should provide some insight into this:

 String 1:     p a r k s 
 String 2:   s p a r k   
 Operation:  i c c c c d 

The operation is the character operation needed to align or transform string 1 into string 2. i is insert, c is copy, r is replace, and d is delete. (You may find other references using s (substitute) instead of r (replace).) The space at the beginning of string 1 or at the end of string 2 is called a gap.

For a given set of strings, there may be many ways to align the strings. For example, the strings abc123 and abcqwertabc123 have two obvious alignments:

 a b c                 1 2 3
 a b c q w e r t a b c 1 2 3

and

                 a b c 1 2 3
 a b c q w e r t a b c 1 2 3

The particular application would determine which alignment is preferred.

Different distance metrics are used to select the desired alignment. For example, the measurement used in the original example of the cost between parks and spark is the Levenshtein distance. In this metric the cost of replacing, adding or deleting a character is one whereas the cost of equal (or copied) characters is zero. (In the conversion of parks to spark, the cost of copying letters p a r k is zero.) The second alignment above (where abc123 has no gap) can be obtained using the Smith-Waterman distance.

For simple strings it is relatively easy to understand how to transform one string into another but with longer strings or very dissimilar strings it is not so clear. For example:

 See Spot run
 We work and play

There are brute force techniques that consider all possible alignments and find the one(s) with the minimum cost. I refer you to a few good references on string edit distance and the associated dynamic programming algorithms for more information.