NU IT
Northwestern University Information Technology
MorphAdorner Northwestern
 
Creating A Suffix Lexicon

CreateSuffixLexicon creates a suffix lexicon from a word lexicon.

Usage:

createsuffixlexicon inputwordlexicon.lex suffixlexicon.lex maxsuffixlength maxsuffixcount allowedpostagsfilename

where

  • inputwordlexicon.lex specifies the name of an input word lexicon in MorphAdorner format to receive the word lexicon.

  • suffixlexicon.lex specifies the name of the output file to receive tthe suffix lexicon.

  • maxsuffixlength specifies the maximum length suffix generated for the suffix lexicon. The default is 6 characters.

  • maxsuffixcount specifies the maximum number of times a spelling must appear in order for its suffix to be added to the suffix lexicon. The default is to include all words regardless of count.

    For some applications you may want to restrict the suffix lexicon to contain suffixes only for infrequently occurring words. Values of 10 (only include spellings which appear 10 or less times in the training data) or 1 (only include spellings which appear once in the training data) are popular choices.

  • allowedpostagsfilename specifies the name of a file containing a list of part of speech tags to use when constructing the suffix lexicon. Omit the tags for parts of speech for closed word classes to which new words should not be added. The MorphAdorner release provides the file nuposallowedpostags.txt in the release data directory which defines a default set of NUPos tags to use when creating a suffix lexicon.

The suffix lexicon is used by the part of speech taggers to guess the potential parts of speech for unknown words which do not appear in the word lexicon. For each successively shorter ending substring of the unknown word, the guesser looks up that substring in the suffix lexicon. When the substring exists in the suffix lexicon, the guesser assigns its associated parts of speech to the unknown word.

Home
 
Announcements and News
 
Documentation
 
Download MorphAdorner
 
Glossary
 
Helpful References
 
Licenses
 
Server
 
Talks
 
Tech Talk