casa
$Rev:20696$
|
Class to deal with Levensthein distance of strings. More...
#include <StringDistance.h>
Public Member Functions | |
StringDistance () | |
Default constructor sets maxDistance to 0. | |
StringDistance (const String &source, Int maxDistance=-1, Bool countSwaps=True, Bool ignoreBlanks=True, Bool caseInsensitive=False) | |
Construct from the source string and maximum distance. | |
const string & | source () const |
Get data members. | |
Int | maxDistance () const |
const Matrix< Int > & | matrix () const |
Bool | match (const String &target) const |
Test if the given target string is within the maximum distance. | |
Int | distance (const String &target) const |
Calculate the distance from the string to the string given in the constructor. | |
Static Public Member Functions | |
static Int | distance (const String &source, const String &target, Bool countSwaps=True) |
Calculate the distance between the two strings. | |
static String | removeBlanks (const String &source) |
Remove blanks from the given string. | |
Static Private Member Functions | |
static Int | doDistance (const String &source, const String &target, Bool countSwaps, Matrix< Int > &matrix) |
Calculate the distance. | |
Private Attributes | |
String | itsSource |
Matrix< Int > | itsMatrix |
Int | itsMaxDistance |
Bool | itsCountSwaps |
Bool | itsIgnoreBlanks |
Bool | itsCaseInsensitive |
Class to deal with Levensthein distance of strings.
The Levenshtein Distance is a metric telling how similar strings are. It is also known as the Edit Distance.
The distance tells how many operations (i.e., character substitutions, insertions, and deletions are needed to transform one string into another.
There are several extensions to the basic definition:
This class optionally uses the swap extension. Furthermore one can optionally ignore blanks. By default both options are used.
The code is based on code written by Anders Sewerin Johansen. Calculating the distance is an expensive O(N^2) operation, thus should be used with care.
The class is constructed with the source string to compare against. Thereafter its match
or distance
function can be used for each target string.
Definition at line 68 of file StringDistance.h.
Default constructor sets maxDistance to 0.
casa::StringDistance::StringDistance | ( | const String & | source, |
Int | maxDistance = -1 , |
||
Bool | countSwaps = True , |
||
Bool | ignoreBlanks = True , |
||
Bool | caseInsensitive = False |
||
) | [explicit] |
Construct from the source string and maximum distance.
If the maximum distance is negative, it defaults to 1+strlength/3. Note that maximum distance 0 means that the strings must match exactly.
Int casa::StringDistance::distance | ( | const String & | target | ) | const |
Calculate the distance from the string to the string given in the constructor.
If the length of target exceeds source length + maxDistance, the difference in lengths is returned.
static Int casa::StringDistance::distance | ( | const String & | source, |
const String & | target, | ||
Bool | countSwaps = True |
||
) | [static] |
Calculate the distance between the two strings.
This is slower than the distance
member function, because it has to allocate the underlying Matrix for each invocation.
static Int casa::StringDistance::doDistance | ( | const String & | source, |
const String & | target, | ||
Bool | countSwaps, | ||
Matrix< Int > & | matrix | ||
) | [static, private] |
Calculate the distance.
Bool casa::StringDistance::match | ( | const String & | target | ) | const |
Test if the given target string is within the maximum distance.
Referenced by casa::TaqlRegex::match().
const Matrix<Int>& casa::StringDistance::matrix | ( | ) | const [inline] |
Definition at line 87 of file StringDistance.h.
References itsMatrix.
Int casa::StringDistance::maxDistance | ( | ) | const [inline] |
Definition at line 85 of file StringDistance.h.
References itsMaxDistance.
static String casa::StringDistance::removeBlanks | ( | const String & | source | ) | [static] |
Remove blanks from the given string.
const string& casa::StringDistance::source | ( | ) | const [inline] |
Bool casa::StringDistance::itsCaseInsensitive [private] |
Definition at line 120 of file StringDistance.h.
Bool casa::StringDistance::itsCountSwaps [private] |
Definition at line 118 of file StringDistance.h.
Bool casa::StringDistance::itsIgnoreBlanks [private] |
Definition at line 119 of file StringDistance.h.
Matrix<Int> casa::StringDistance::itsMatrix [mutable, private] |
Definition at line 116 of file StringDistance.h.
Referenced by matrix().
Int casa::StringDistance::itsMaxDistance [private] |
Definition at line 117 of file StringDistance.h.
Referenced by maxDistance().
String casa::StringDistance::itsSource [private] |
Definition at line 115 of file StringDistance.h.
Referenced by source().