edu.northwestern.at.utils.corpuslinguistics.spellingstandardizer
Class ExtendedSearchSpellingStandardizer

java.lang.Object
  extended by edu.northwestern.at.utils.IsCloseableObject
      extended by edu.northwestern.at.utils.corpuslinguistics.spellingstandardizer.AbstractSpellingStandardizer
          extended by edu.northwestern.at.utils.corpuslinguistics.spellingstandardizer.SimpleSpellingStandardizer
              extended by edu.northwestern.at.utils.corpuslinguistics.spellingstandardizer.ExtendedSimpleSpellingStandardizer
                  extended by edu.northwestern.at.utils.corpuslinguistics.spellingstandardizer.ExtendedSearchSpellingStandardizer
All Implemented Interfaces:
SpellingStandardizer, UsesLogger

public class ExtendedSearchSpellingStandardizer
extends ExtendedSimpleSpellingStandardizer
implements SpellingStandardizer

ExtendedSearchSpellingStandardizer: extended search spelling standardizer.

ExtendedSearchSpellingStandardizer uses spelling correction methods to try to find a good list of suggested standardized spellings.


Field Summary
protected  DoubleMetaphone doubleMetaphone
          Double metaphone encoder.
protected  SpellingChecker spellingChecker
          Spelling checker.
 
Fields inherited from class edu.northwestern.at.utils.corpuslinguistics.spellingstandardizer.ExtendedSimpleSpellingStandardizer
gapFiller
 
Fields inherited from class edu.northwestern.at.utils.corpuslinguistics.spellingstandardizer.AbstractSpellingStandardizer
alternateSpellingsWordClasses, defaultSpellingsByWordClassFileName, lexicon, logger, mappedSpellings, spellingsByWordClass, standardSpellingSet
 
Constructor Summary
ExtendedSearchSpellingStandardizer()
          Create extended search spelling standardizer.
 
Method Summary
 java.lang.String[] applyHeuristics(java.lang.String spelling)
          Apply heuristics to spellings to see if we can find a match..
 void createDictionaries()
          Creates dictionaries from spelling lists.
 java.lang.String getBestSuggestedSpelling(java.lang.String spelling)
          Get best suggested spelling.
 java.util.List<ScoredString> getScoredSuggestedSpellings(java.lang.String spelling)
          Return suggested spellings.
 java.lang.String[] getSuggestedSpellings(java.lang.String spelling)
          Return suggested spellings.
 void loadAlternativeSpellings(java.io.Reader reader, java.lang.String delimChars)
          Loads alternative spellings from a reader.
 void loadStandardSpellings(java.io.Reader reader)
          Loads standard spellings from a reader.
 java.lang.String longSVariant(java.lang.String spelling)
          Apply "long s" heuristics to a spelling.
 java.lang.String preprocessSpelling(java.lang.String spelling)
          Preprocess spelling.
 java.lang.String simpleReplacement(java.lang.String spelling, java.lang.String pattern, java.lang.String replacement)
          Apply simple string replacement.
 java.lang.String[] standardizeSpelling(java.lang.String spelling)
          Returns standard spellings given a spelling.
 java.lang.String toString()
          Return standardizer description.
 
Methods inherited from class edu.northwestern.at.utils.corpuslinguistics.spellingstandardizer.ExtendedSimpleSpellingStandardizer
doStandardizeSpelling, fixGaps
 
Methods inherited from class edu.northwestern.at.utils.corpuslinguistics.spellingstandardizer.AbstractSpellingStandardizer
addCachedSpelling, addMappedSpelling, addStandardSpelling, addStandardSpellings, fixCapitalization, getLexicon, getLogger, getMappedSpellings, getNumberOfAlternateSpellings, getNumberOfAlternateSpellingsByWordClass, getNumberOfStandardSpellings, getStandardSpellings, loadAlternativeSpellings, loadAlternativeSpellingsByWordClass, loadStandardSpellings, setLexicon, setLogger, setMappedSpellings, setStandardSpellings, standardizeSpelling
 
Methods inherited from class edu.northwestern.at.utils.IsCloseableObject
close
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 
Methods inherited from interface edu.northwestern.at.utils.corpuslinguistics.spellingstandardizer.SpellingStandardizer
addMappedSpelling, addStandardSpelling, addStandardSpellings, fixCapitalization, getMappedSpellings, getNumberOfAlternateSpellings, getNumberOfAlternateSpellingsByWordClass, getNumberOfStandardSpellings, getStandardSpellings, loadAlternativeSpellings, loadAlternativeSpellingsByWordClass, loadStandardSpellings, setMappedSpellings, setStandardSpellings, standardizeSpelling
 

Field Detail

spellingChecker

protected SpellingChecker spellingChecker
Spelling checker.


doubleMetaphone

protected DoubleMetaphone doubleMetaphone
Double metaphone encoder.

Constructor Detail

ExtendedSearchSpellingStandardizer

public ExtendedSearchSpellingStandardizer()
Create extended search spelling standardizer.

Method Detail

createDictionaries

public void createDictionaries()
Creates dictionaries from spelling lists.


loadAlternativeSpellings

public void loadAlternativeSpellings(java.io.Reader reader,
                                     java.lang.String delimChars)
                              throws java.io.IOException
Loads alternative spellings from a reader.

Specified by:
loadAlternativeSpellings in interface SpellingStandardizer
Overrides:
loadAlternativeSpellings in class AbstractSpellingStandardizer
Parameters:
reader - The reader.
delimChars - Delimiter characters separating spelling pairs.
Throws:
java.io.IOException

loadStandardSpellings

public void loadStandardSpellings(java.io.Reader reader)
                           throws java.io.IOException
Loads standard spellings from a reader.

Specified by:
loadStandardSpellings in interface SpellingStandardizer
Overrides:
loadStandardSpellings in class AbstractSpellingStandardizer
Parameters:
reader - The reader.
Throws:
java.io.IOException

applyHeuristics

public java.lang.String[] applyHeuristics(java.lang.String spelling)
Apply heuristics to spellings to see if we can find a match..

Parameters:
spelling - Spelling to which to apply heuristics.
Returns:
Near matches after applying heuristics.

simpleReplacement

public java.lang.String simpleReplacement(java.lang.String spelling,
                                          java.lang.String pattern,
                                          java.lang.String replacement)
Apply simple string replacement.

Parameters:
spelling - The spelling.
pattern - String of characters to look for in spelling.
replacement - Replacement characters.
Returns:
If revised spelling in spelling map, return revised spelling. Otherwise return empty string.

longSVariant

public java.lang.String longSVariant(java.lang.String spelling)
Apply "long s" heuristics to a spelling.

Parameters:
spelling - Spelling suggestion to which to apply heuristics.
Returns:
Revised spelling.

preprocessSpelling

public java.lang.String preprocessSpelling(java.lang.String spelling)
Preprocess spelling.

Specified by:
preprocessSpelling in interface SpellingStandardizer
Overrides:
preprocessSpelling in class ExtendedSimpleSpellingStandardizer
Parameters:
spelling - Spelling to preprocess.
Returns:
Preprocessed spelling.

standardizeSpelling

public java.lang.String[] standardizeSpelling(java.lang.String spelling)
Returns standard spellings given a spelling.

Specified by:
standardizeSpelling in interface SpellingStandardizer
Overrides:
standardizeSpelling in class ExtendedSimpleSpellingStandardizer
Parameters:
spelling - The spelling.
Returns:
The standard spellings as an array of String.

getBestSuggestedSpelling

public java.lang.String getBestSuggestedSpelling(java.lang.String spelling)
Get best suggested spelling.

Parameters:
spelling - The spelling for which to return suggestion.
Returns:
Best (by score) suggested spellings.

getScoredSuggestedSpellings

public java.util.List<ScoredString> getScoredSuggestedSpellings(java.lang.String spelling)
Return suggested spellings.

Parameters:
spelling - The spelling for which to return suggestions.
Returns:
List suggested spellings with scores.

getSuggestedSpellings

public java.lang.String[] getSuggestedSpellings(java.lang.String spelling)
Return suggested spellings.

Parameters:
spelling - The spelling for which to return suggestions.
Returns:
Array of strings of suggested spellings.

toString

public java.lang.String toString()
Return standardizer description.

Overrides:
toString in class ExtendedSimpleSpellingStandardizer
Returns:
Standardizer description.