public class CreateSuffixLexicon
extends java.lang.Object
java -Xmx512m edu.northwestern.at.morphadorner.tools.createsuffixlexicon.CreateSuffixLexicon
inputwordlexicon outputsuffixlexicon maxsuffixlength maxsuffixcount allowedpostagsfilename
inputwordlexicon specifies the name of the input file containng the word lexicon from which to extract a suffix lexicon.
outputsuffixlexicon specifies the name of the output file to receive the suffix lexicon.
maxsuffixlength specifies the maximum length suffix generated for the suffix lexicon. The default is 6.
maxsuffixcount specifies the maximum number of times a spelling can appear in order for its suffix to be added to the suffix lexicon. The default is to include all words regardless of count.
For some applications you may want to restrict the suffix lexicon to contain suffixes only for infrequently occurring words. Values of 10 (only include spellings which appear 10 or less times in the training data) or 1 (only include spellings which appear once in the training data) are popular choices.
allowedpostagsfilename specifies the name of a file containing a list of part of speech tags to use when constructing the suffix lexicon. Part of speech not on this list will typically be parts of speech for closed word classes to which new words should not be added. The default is to include all part of speech tags.
Modifier and Type | Field and Description |
---|---|
protected static java.util.Set<java.lang.String> |
allowedPosTags
Holds allowed pos tags.
|
protected static java.lang.String |
allowedPosTagsFileName
File name containing list of part of speech tags
to be used when creating the suffix lexicon entries.
|
protected static int |
maxSuffixCount
Only use words less than maxSuffixCount to generate
suffix lexicon.
|
protected static int |
maxSuffixLength
Maximum and minimum length suffixes to generated.
|
protected static int |
minSuffixLength |
protected static java.lang.String |
suffixLexiconFileName
Output suffix lexicon file name.
|
protected static java.lang.String |
wordLexiconFileName
Input word lexicon file name.
|
Constructor and Description |
---|
CreateSuffixLexicon() |
Modifier and Type | Method and Description |
---|---|
protected static void |
help()
Display brief help.
|
protected static boolean |
initialize(java.lang.String[] args)
Initialize.
|
static void |
main(java.lang.String[] args)
Main program.
|
protected static java.lang.String wordLexiconFileName
protected static java.lang.String suffixLexiconFileName
protected static int maxSuffixCount
The default is to use all words regardless of word count.
protected static int maxSuffixLength
protected static int minSuffixLength
protected static java.lang.String allowedPosTagsFileName
protected static java.util.Set<java.lang.String> allowedPosTags
protected static void help()
protected static boolean initialize(java.lang.String[] args)
args
- Command line arguments.public static void main(java.lang.String[] args)
args
- Command line arguments.