See: Description
Interface | Description |
---|---|
SpellingStandardizer |
Interface for a Spelling Standardizer.
|
Class | Description |
---|---|
AbstractSpellingStandardizer |
Abstract Spelling Standardizer.
|
DecruftifyingSpellingStandardizer |
DecruftifyingSpellingStandardizer cleans up spellings.
|
DefaultSpellingStandardizer |
DefaultSpellingStandardizer: Default spelling standardizer.
|
EnglishDecruftifier |
Spelling decruftifier.
|
EnglishDecruftifier.CruftySpelling |
Holds spelling in process of decruftification.
|
ExtendedSearchSpellingStandardizer |
ExtendedSearchSpellingStandardizer: extended search spelling standardizer.
|
ExtendedSimpleSpellingStandardizer |
SimpleSpellingStandardizer maps alternate spellings
to standard spellings.
|
GapFiller |
Gap Filler: Finds candidate words to match words with gaps.
|
NoopSpellingStandardizer |
NoopSpellingStandardizer returns original spelling unchanged.
|
SimpleSpellingStandardizer |
SimpleSpellingStandardizer maps alternate spellings
to standard spellings.
|
SpellingStandardizerFactory |
SpellingStandardizer factory.
|
English texts of the past exhibit far greater spelling variance than contemporary texts. Texts from the seventeenth century and earlier times use conventions that differ from contemporary standards in the use of "u" and "v" and "y" and capitalization, among others. Often the same words is spelled differently even within the same work. By the eighteenth-century texts employ much more modern orthographic standards, except for capitalization.
MorphAdorner uses rules, word lists, and extended search techniques such as spelling correction methods and other heuristics to map variant spellings to their standard (usually modern) form. For obsolete words no longer in use, a representative standard form is chosen which is usually the Oxford English Dictionary headword form. Presently MorphAdorner knows about 336,000 variant spellings. Using this list, MorphAdorner can automatically determine the correct standard form for previously unseen spellings in many cases.
Sometimes a new spelling is just too different from any of the ones MorphAdorner already knows. Using the extended search facilities on such a spelling may result in a "standard spelling" which veers far from the correct form. As time goes one we hope to reduce the occurrence of such errors.
Orthographic standardization improves the quality of part-of-speech tagging, name recognition, and text searching. However, standardization by itself isn't sufficient to fix some other problems. These include the lack of the apostrophe to mark the possessive case and the inconsistent practices of capitalization as markers of proper nouns.
All MorphAdorner spelling standardizers must implement the
SpellingStandardizer
interface. The
SpellingStandardizerFactory
provides the mechanism for instantiating a default or specified instance
of a SpellingStandardizer implementation.
The AbstractSpellingStandardizer
serves as a base class for deriving concrete implementations of
spelling standardizers.