How to implement the Levenshtein distance algorithm in Go

The Levenshtein distance is a measure of the difference between two strings, calculated as the minimum number of operations needed to transform one string into the other. The possible operations are insertion, deletion, and replacement of a character. This algorithm is useful in text search, spell checking, bioinformatics, and other fields that require string comparison. In this article, we will see how to implement the Levenshtein distance algorithm in Go.

The Levenshtein distance between two strings s1 and s2 is based on three operations:

  1. Insertion of a character.
  2. Deletion of a character.
  3. Replacement of a character.

For example, the Levenshtein distance between "cat" and "act" is 1 (just delete the 'g'), while between "house" and "dear" it is 2 (replace 's' with 'r' and vice versa).

To calculate the Levenshtein distance, we can use an array to store the intermediate results, thus optimizing the algorithm with programming dynamic.

The algorithm:

  1. Create a dp matrix with m + 1 rows and n + 1 columns, where m and n are the lengths of the strings s1 and s2.
  2. Initialize the first row and column with indices, as they represent the number of operations needed to convert an empty string to a string with i characters (insertion) or from a string with j characters (deletion).
  3. Fill the matrix:
    • If the character of s1 at position i-1 is equal to that of s2 at position j-1, then the cost is 0.
    • Otherwise, we compute the cost as the minimum of:
      • Deletion cost,
      • Insertion cost,
      • Replacement cost.

The value at the bottom right of the matrix represents the Levenshtein distance.

Now let's see how to implement this algorithm in Go.


package main

import (
  "fmt"
)

// Function to compute the Levenshtein distance between two strings
func Levenshtein(s1, s2 string) int {
  m, n := len(s1), len(s2)

  // Initialize the matrix dp of size (m+1)x(n+1)
  dp := make([][]int, m+1)
  for i := range dp {
    dp[i] = make([]int, n+1)
  }

  // Fill the first row and column
  for i := 0; i <= m; i++ {
    dp[i][0] = i
  }
  for j := 0; j <= n; j++ {
    dp[0][j] = j
  }

  // Fill the matrix dp with the minimum costs
  for i := 1; i <= m; i++ {
    for j := 1; j <= n; j++ {
      if s1[i-1] == s2[j-1] {
        dp[i][j] = dp[i-1][j-1]
      } else {
        dp[i][j] = min(
          dp[i-1][j]+1, // Deletion
          dp[i][j-1]+1, // Insertion
          dp[i-1][j-1]+1, // Replacement
       )
      }
    }
  }

  return dp[m][n]
}

// Utility function to find the minimum of three numbers
func min(a, b, c int) int {
 if a < b && a < c {
    return a
 }
 if b < c {
    return b
 }
 return c
}

// Example of using the Levenshtein function
func main() {
  s1 := "cat"
  s2 := "act"
  fmt.Printf("The Levenshtein distance between '%s' and '%s' is: %d\n", s1, s2, Levenshtein(s1,   s2))
}

Explanation:

  1. Matrix Initialization: The dp matrix of size (m+1) x (n+1) is created with the make package, and then each cell is initialized with appropriate values ​​(row and column index).

  2. Matrix Population: The function iterates on each character of both strings. If the corresponding characters are equal, the cell value is copied from the diagonal (zero cost). If they are different, the minimum cost of the three operations: insertion, deletion and replacement is calculated.

  3. Final Result: The lower right cell of dp (dp[m][n]) contains the Levenshtein distance between the two strings.

The current algorithm has complexity O(m * n) in both space and time, where m and n are the lengths of the two strings. If you want to reduce the space complexity, you can use a one-dimensional array, keeping track of only the current and previous row or column, since we only use these values ​​at each iteration.

Conclusion

Implementing the Levenshtein distance in Go is a fairly simple process thanks to the support for multidimensional arrays and string operations. This approach can be used in a variety of applications where string comparison is crucial.

Back to top