edu.northwestern.at.utils.corpuslinguistics.spellingstandardizer
Interface SpellingStandardizer

All Known Implementing Classes:
AbstractSpellingStandardizer, DecruftifyingSpellingStandardizer, DefaultSpellingStandardizer, ExtendedSearchSpellingStandardizer, ExtendedSimpleSpellingStandardizer, NoopSpellingStandardizer, RemoteSpellingStandardizer, SimpleSpellingStandardizer

public interface SpellingStandardizer

Interface for a Spelling Standardizer.


Method Summary
 void addMappedSpelling(java.lang.String alternateSpelling, java.lang.String standardSpelling)
          Add a mapped spelling.
 void addStandardSpelling(java.lang.String standardSpelling)
          Add a standard spelling.
 void addStandardSpellings(java.util.Collection<java.lang.String> standardSpellings)
          Add standard spellings from a collection.
 java.lang.String fixCapitalization(java.lang.String spelling, java.lang.String standardSpelling)
          Fix capitalization of standardized spelling.
 TaggedStrings getMappedSpellings()
          Return the spelling map.
 int getNumberOfAlternateSpellings()
          Returns number of alternate spellings.
 int[] getNumberOfAlternateSpellingsByWordClass()
          Returns number of alternate spellings by word class.
 int getNumberOfStandardSpellings()
          Returns number of standard spellings.
 java.util.Set<java.lang.String> getStandardSpellings()
          Return the standard spellings.
 void loadAlternativeSpellings(java.io.Reader reader, java.lang.String delimChars)
          Loads alternative spellings from a reader.
 void loadAlternativeSpellings(java.net.URL url, java.lang.String encoding, java.lang.String delimChars)
          Loads alternate spellings from a URL.
 void loadAlternativeSpellingsByWordClass(java.net.URL url, java.lang.String encoding)
          Load alternate to standard spellings by word class.
 void loadStandardSpellings(java.io.Reader reader)
          Loads standard spellings from a reader.
 void loadStandardSpellings(java.net.URL url, java.lang.String encoding)
          Loads standard spellings from a URL.
 java.lang.String preprocessSpelling(java.lang.String spelling)
          Preprocess spelling.
 void setMappedSpellings(TaggedStrings standardMappedSpellings)
          Sets map which maps alternate spellings to standard spellings.
 void setStandardSpellings(java.util.Set<java.lang.String> standardSpellings)
          Sets standard spellings.
 java.lang.String[] standardizeSpelling(java.lang.String spelling)
          Returns standard spellings given a spelling.
 java.lang.String standardizeSpelling(java.lang.String spelling, java.lang.String wordClass)
          Returns a standard spelling given a standard or alternate spelling.
 

Method Detail

loadAlternativeSpellings

void loadAlternativeSpellings(java.net.URL url,
                              java.lang.String encoding,
                              java.lang.String delimChars)
                              throws java.io.IOException
Loads alternate spellings from a URL.

Parameters:
url - URL containing alternate spellings to standard spellings mappings.
encoding - Character set encoding for spellings
delimChars - Delimiter characters separating spelling pairs
Throws:
java.io.IOException

loadAlternativeSpellings

void loadAlternativeSpellings(java.io.Reader reader,
                              java.lang.String delimChars)
                              throws java.io.IOException
Loads alternative spellings from a reader.

Parameters:
reader - The reader.
delimChars - Delimiter characters separating spelling pairs.
Throws:
java.io.IOException

loadAlternativeSpellingsByWordClass

void loadAlternativeSpellingsByWordClass(java.net.URL url,
                                         java.lang.String encoding)
                                         throws java.io.IOException
Load alternate to standard spellings by word class.

Parameters:
url - URL of alternative spellings by word class.
encoding - Character set encoding for spellings
Throws:
java.io.IOException

loadStandardSpellings

void loadStandardSpellings(java.net.URL url,
                           java.lang.String encoding)
                           throws java.io.IOException
Loads standard spellings from a URL.

Parameters:
url - URL containing standard spellings
encoding - Character set encoding for spellings
Throws:
java.io.IOException

loadStandardSpellings

void loadStandardSpellings(java.io.Reader reader)
                           throws java.io.IOException
Loads standard spellings from a reader.

Parameters:
reader - The reader.
Throws:
java.io.IOException

setMappedSpellings

void setMappedSpellings(TaggedStrings standardMappedSpellings)
Sets map which maps alternate spellings to standard spellings.

Parameters:
standardMappedSpellings - TaggedStrings with alternate spellings as keys and standard spellings as tag values.

setStandardSpellings

void setStandardSpellings(java.util.Set<java.lang.String> standardSpellings)
Sets standard spellings.

Parameters:
standardSpellings - Set of standard spellings.

addMappedSpelling

void addMappedSpelling(java.lang.String alternateSpelling,
                       java.lang.String standardSpelling)
Add a mapped spelling.

Parameters:
alternateSpelling - The alternate spelling.
standardSpelling - The corresponding standard spelling.

addStandardSpelling

void addStandardSpelling(java.lang.String standardSpelling)
Add a standard spelling.

Parameters:
standardSpelling - A standard spelling.

addStandardSpellings

void addStandardSpellings(java.util.Collection<java.lang.String> standardSpellings)
Add standard spellings from a collection.

Parameters:
standardSpellings - A collection of standard spellings.

preprocessSpelling

java.lang.String preprocessSpelling(java.lang.String spelling)
Preprocess spelling.

Parameters:
spelling - Spelling to preprocess before standardization.
Returns:
Preprocessed spelling, ready for standardization.

standardizeSpelling

java.lang.String[] standardizeSpelling(java.lang.String spelling)
Returns standard spellings given a spelling.

Parameters:
spelling - The spelling.
Returns:
The standard spellings as an array of String.

standardizeSpelling

java.lang.String standardizeSpelling(java.lang.String spelling,
                                     java.lang.String wordClass)
Returns a standard spelling given a standard or alternate spelling.

Parameters:
spelling - The spelling.
wordClass - The word class.
Returns:
The standard spelling.

fixCapitalization

java.lang.String fixCapitalization(java.lang.String spelling,
                                   java.lang.String standardSpelling)
Fix capitalization of standardized spelling.

Parameters:
spelling - The original spelling.
standardSpelling - The candidate standard spelling.
Returns:
Standard spelling with initial capitalization matching original spelling.

getNumberOfAlternateSpellings

int getNumberOfAlternateSpellings()
Returns number of alternate spellings.

Returns:
The number of alternate spellings.

getNumberOfAlternateSpellingsByWordClass

int[] getNumberOfAlternateSpellingsByWordClass()
Returns number of alternate spellings by word class.

Returns:
int array with two entries. [0] = The number of alternate spellings word classes. [1] = The number of alternate spellings in the word classes.

getNumberOfStandardSpellings

int getNumberOfStandardSpellings()
Returns number of standard spellings.

Returns:
The number of standard spellings.

getMappedSpellings

TaggedStrings getMappedSpellings()
Return the spelling map.

Returns:
The spelling map as a TaggedStrings object. May be null if this standardizer does not use such a map.

getStandardSpellings

java.util.Set<java.lang.String> getStandardSpellings()
Return the standard spellings.

Returns:
The standard spellings as a Set. May be null.