Edit Distance
Category : Dynamic Programming
Description : Given two strings str1 and str2 and below operations that can performed on str1, Find minimum number of edits required to convert ‘str1‘ into ‘str2‘.
- Insert
- Remove
- Replace
Attention : All of the above operations are of equal cost
Examples
Input: str1 = "geek", str2 = "gesek"
Output: 1
We can convert str1 into str2 by inserting a ‘s‘
Input: str1 = "cat", str2 = "cut"
Output: 1
We can convert str1 into str2 by replacing ‘a‘ with ‘u‘
Input: str1 = "sunday", str2= "saturday"
Output: 3
Last three and first characters are same. We basically need to convert "un" to "atur". This can be done using below three operations.
Replace ‘n‘ with ‘r‘, insert t, insert a
What are the subproblems in this case?
The idea is process all characters one by one staring from either from left or right sides of both strings.
Let we traverse from right corner, there are two possibilities for every pair of character being traversed.
m: Length of str1
n: Length of str2
- If last characters of two strings are same, nothing much to do. Ignore last characters and get count for remaining strings. So we recur for lengths m-1 and n-1
- Else, we consider all operations on str1, consider all three operations on last character of first string. recursively compute minimum cost for all three operations and take minimum of three values.
- Insert : recur for m and n - 1
- Remove : recur for m - 1 and n
- Replace : recur for m - 1 and n - 1
Naive recursive solution
// Utility function to find minimum of three numbers
int min(int x, int y, int z)
{
return min(min(x, y), z);
}
int editDist(string str1, string str2, int m, int n)
{
// If first string is empty, the only option is to
// insert all characters of second string into first
if(m == 0) return n;
// If second string is empty, the only option is to
// remove all characters of first string
if (n == 0) return m;
// if last characters of two strings are same
if(str1[m-1] == str2[n-1])
return editDist(str1, str2, m-1, n-1);
return 1 + min (editDist(str1, str2, m, n-1), //insert
editDist(str1, str2, m-1, n), //remove
editDist(str1, str2, m-1, n-1) //replace
);
}
int main(){
string str1 = "sunday";
string str2 = "saturday";
cout << editDist(str1, str2, str1.length(), str2.length());
return 0;
}
// Output
3
The time complexity of above solution is exponential. In worst case, we may end up doing $O(3 ^ m)$ operations. The worst case happens when none of characters of two strings match. Below is a recursive call diagram for worst case.
Like other typical Dynamic Programming(DP) problems, recomputations of same subproblems can be avoided by constructing a temporaty array that stores results of subproblems.
int min(int x, int y, int z)
{
return min(min(x, y), z);
}
int editDistDP(string str1, string str2, int m, int n)
{
int dp[m+1][n+1];
// Fill dp[][] in bottom up manner
for(int i = 0; i <= m ; i++)
{
for(int j = 0; j <= n; j++)
{
if(i == 0)
dp[i][j] = j;
else if(j == 0)
dp[i][j] = i;
else if(str1[i-1] == str2[j-1])
dp[i][j] = dp[i-1][j-1];
else
dp[i][j] = 1 + min(dp[i][j-1], //insert
dp[i-1][j], //remove
dp[i-1][j-1] //replace
);
}
}
return dp[m][n];
}
// Driver program
int main()
{
string str1 = "sunday";
string str2 = "saturday";
cout << editDistDP(str1, str2, str1.length(), str2.length());
return 0;
}
//Output
3