The Levenshtein distance is a measure of the difference between two strings, calculated as the minimum number of operations needed to transform one string into the other. The possible operations are insertion, deletion, and replacement of a character. This algorithm is useful in text search, spell checking, bioinformatics, and other fields that require string comparison. In this article, we will see how to implement the Levenshtein distance algorithm in Go.
The Levenshtein distance between two strings s1
and s2
is based on three operations:
- Insertion of a character.
- Deletion of a character.
- Replacement of a character.
For example, the Levenshtein distance between "cat" and "act" is 1 (just delete the 'g'), while between "house" and "dear" it is 2 (replace 's' with 'r' and vice versa).
To calculate the Levenshtein distance, we can use an array to store the intermediate results, thus optimizing the algorithm with programming dynamic.
The algorithm:
- Create a
dp
matrix withm + 1
rows andn + 1
columns, wherem
andn
are the lengths of the stringss1
ands2
. - Initialize the first row and column with indices, as they represent the number of operations needed to convert an empty string to a string with
i
characters (insertion) or from a string withj
characters (deletion). - Fill the matrix:
- If the character of
s1
at positioni-1
is equal to that ofs2
at positionj-1
, then the cost is 0. - Otherwise, we compute the cost as the minimum of:
- Deletion cost,
- Insertion cost,
- Replacement cost.
- If the character of
The value at the bottom right of the matrix represents the Levenshtein distance.
Now let's see how to implement this algorithm in Go.
package main
import (
"fmt"
)
// Function to compute the Levenshtein distance between two strings
func Levenshtein(s1, s2 string) int {
m, n := len(s1), len(s2)
// Initialize the matrix dp of size (m+1)x(n+1)
dp := make([][]int, m+1)
for i := range dp {
dp[i] = make([]int, n+1)
}
// Fill the first row and column
for i := 0; i <= m; i++ {
dp[i][0] = i
}
for j := 0; j <= n; j++ {
dp[0][j] = j
}
// Fill the matrix dp with the minimum costs
for i := 1; i <= m; i++ {
for j := 1; j <= n; j++ {
if s1[i-1] == s2[j-1] {
dp[i][j] = dp[i-1][j-1]
} else {
dp[i][j] = min(
dp[i-1][j]+1, // Deletion
dp[i][j-1]+1, // Insertion
dp[i-1][j-1]+1, // Replacement
)
}
}
}
return dp[m][n]
}
// Utility function to find the minimum of three numbers
func min(a, b, c int) int {
if a < b && a < c {
return a
}
if b < c {
return b
}
return c
}
// Example of using the Levenshtein function
func main() {
s1 := "cat"
s2 := "act"
fmt.Printf("The Levenshtein distance between '%s' and '%s' is: %d\n", s1, s2, Levenshtein(s1, s2))
}
Explanation:
Matrix Initialization: The
dp
matrix of size(m+1) x (n+1)
is created with themake
package, and then each cell is initialized with appropriate values ββ(row and column index).Matrix Population: The function iterates on each character of both strings. If the corresponding characters are equal, the cell value is copied from the diagonal (zero cost). If they are different, the minimum cost of the three operations: insertion, deletion and replacement is calculated.
Final Result: The lower right cell of
dp
(dp[m][n]
) contains the Levenshtein distance between the two strings.
The current algorithm has complexity O(m * n)
in both space and time, where m
and n
are the lengths of the two strings. If you want to reduce the space complexity, you can use a one-dimensional array, keeping track of only the current and previous row or column, since we only use these values ββat each iteration.
Conclusion
Implementing the Levenshtein distance in Go is a fairly simple process thanks to the support for multidimensional arrays and string operations. This approach can be used in a variety of applications where string comparison is crucial.