public class LevensteinDistance extends java.lang.Object implements StringSimilarity
The Levenstein edit distance is the number of insertions, deletions, substitutions, and adjacent transpositions required to transform one string into another. The larger the Levenstein distance, the more different the strings are.
The edit distance between two strings s1 and s2 can be converted to a similarity measure as follows:
max_length = max( length of s1 , length of s2 ) edit_distance = edit distance between s1 and s2 similarity = 1.0 - ( edit_distance / max_length )
This implementation of Levenstein distance is based upon one by Michael Gilleland and Charles Emerick.
Constructor and Description |
---|
LevensteinDistance()
Create Levenstein distance instance.
|
Modifier and Type | Method and Description |
---|---|
static boolean |
areAlike(java.lang.String s1,
java.lang.String s2)
Are two strings alike based upon edit distance.
|
static int |
editDistance(java.lang.String s1,
java.lang.String s2)
Compute Levenstein edit distance between two strings.
|
static double |
levensteinSimilarity(java.lang.String s1,
java.lang.String s2)
Compute similarity between two strings.
|
double |
similarity(java.lang.String s1,
java.lang.String s2)
Compute Levenstein distance similarity of two strings.
|
public LevensteinDistance()
public static int editDistance(java.lang.String s1, java.lang.String s2)
s1
- First string.s2
- Second string.,public static double levensteinSimilarity(java.lang.String s1, java.lang.String s2)
s1
- First string.s2
- Second string.The similarity is computed from the edit distance between s1 and s2 as follows:
max_length = max( length of s1 , length of s2 ) edit_distance = edit distance between s1 and s2 similarity = 1.0 - ( edit_distance / max_length )
public static boolean areAlike(java.lang.String s1, java.lang.String s2)
s1
- First string.s2
- Second string.public double similarity(java.lang.String s1, java.lang.String s2)
similarity
in interface StringSimilarity
s1
- First string.s2
- Second string.