最喜欢的算法(们) - Levenshtein distance

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了最喜欢的算法(们) - Levenshtein distance相关的知识,希望对你有一定的参考价值。

String Matching: Levenshtein distance

  • Purpose: to use as little effort to convert one string into the other
  • Intuition behind the method: replacement, addition or deletion of a charcter in a string
  • Steps

Step

Description

1

Set n to be the length of s.

Set m to be the length of t.

If n = 0, return m and exit.

If m = 0, return n and exit.

Construct a matrix containing 0..m rows and 0..n columns.

2

Initialize the first row to 0..n.

Initialize the first column to 0..m.

3

Examine each character of s (i from 1 to n).

4

Examine each character of t (j from 1 to m).

5

If s[i] equals t[j], the cost is 0.

If s[i] doesn‘t equal t[j], the cost is 1.

6

Set cell d[i,j] of the matrix equal to the minimum of:

a. The cell immediately above plus 1: d[i-1,j] + 1.

b. The cell immediately to the left plus 1: d[i,j-1] + 1.

c. The cell diagonally above and to the left plus the cost: d[i-1,j-1] + cost.

7

After the iteration steps (3, 4, 5, 6) are complete, the distance is found in cell d[n,m].

  • Example

This section shows how the Levenshtein distance is computed when the source string is "GUMBO" and the target string is "GAMBOL".

Steps 1 and 2

    G U M B O
  0 1 2 3 4 5
G 1          
A 2          
M 3          
B 4          
O 5          
L 6          

Steps 3 to 6 When i = 1

    G U M B O
  0 1 2 3 4 5
G 1 0        
A 2 1        
M 3 2        
B 4 3        
O 5 4        
L 6 5        

Steps 3 to 6 When i = 2

    G U M B O
  0 1 2 3 4 5
G 1 0 1      
A 2 1 1      
M 3 2 2      
B 4 3 3      
O 5 4 4      
L 6 5 5      

Steps 3 to 6 When i = 3

    G U M B O
  0 1 2 3 4 5
G 1 0 1 2    
A 2 1 1 2    
M 3 2 2 1    
B 4 3 3 2    
O 5 4 4 3    
L 6 5 5 4    

Steps 3 to 6 When i = 4

    G U M B O
  0 1 2 3 4 5
G 1 0 1 2 3  
A 2 1 1 2 3  
M 3 2 2 1 2  
B 4 3 3 2 1  
O 5 4 4 3 2  
L 6 5 5 4 3  

Steps 3 to 6 When i = 5

    G U M B O
  0 1 2 3 4 5
G 1 0 1 2 3 4
A 2 1 1 2 3 4
M 3 2 2 1 2 3
B 4 3 3 2 1 2
O 5 4 4 3 2 1
L 6 5 5 4 3 2

Step 7

The distance is in the lower right hand corner of the matrix, i.e. 2. This corresponds to our intuitive realization that "GUMBO" can be transformed into "GAMBOL" by substituting "A" for "U" and adding "L" (one substitution and 1 insertion = 2 changes).

 

以上是关于最喜欢的算法(们) - Levenshtein distance的主要内容,如果未能解决你的问题,请参考以下文章

C#实现Levenshtein distance最小编辑距离算法

Levenshtein距离编辑距离算法字符串相似度算法

Damerau-Levenshtein 距离实施

“模糊匹配”字符串的算法

如何调整 Levenshtein 距离算法以将匹配限制为单个单词?

尝试在 T-SQL 查询中使用 Levenshtein 距离 - 请帮助优化