CreateLexicon (MorphAdorner)

java.lang.Object
- edu.northwestern.at.morphadorner.tools.createlexicon.CreateLexicon

```
public class CreateLexicon
extends java.lang.Object
```
Generate lexicons from training data.
java -Xmx512m edu.northwestern.at.morphadorner.tools.createlexicon.CreateLexicon trainingdata outputwordlexicon outputsuffixlexicon maxsuffixlength maxsuffixcount
- trainingdata specifies the name of the file containing the part of speech training data from which the word lexicon and suffix lexicon are built. The word lexicon contains each spelling (and standard spellings if provided), the count for each spelling, the parts of speech for each spelling, the counts for each part of speech for each spelling, and the lemma for each part of speech for each spelling (if provided). The suffix lexicon contains a list of suffixes, their counts, and the parts of speech associated with each suffix and the count of each part of speech. Lemmata are stored as a "*' in the suffix lexicon since there are no lemmata for suffixes.
  
  The training data resides in a utf-8 text file. Each line contains one tab-separated spelling along with its part of speech tag and optionally its lemma and standard spelling in the form:
  
  spellingpart-of-speech-taglemmastandardspelling
  
  You must specify a spelling and a part of speech tag. The lemma and standard spelling are optional. If you wish to specify a standard spelling without specifying a lemma, enter the lemma as "*".
  
  Blank lines are used to separate sentences. While the blank lines are not needed for creating the lexicon, they are needed for creating probability transition matrices and for part of speech tagging.
  
  The lexicon is built using both the spelling and the standard spelling (when provided). The lemma is also stored when present.
- outputwordlexicon specifies the name of the output file to receive the word lexicon.
- outputsuffixlexicon specifies the name of the output file to receive tthe suffix lexicon.
- maxsuffixlength specifies the maximum length suffix generated for the suffix lexicon. The default is 6.
- maxsuffixcount specifies the maximum number of times a spelling can appear in order for its suffix to be added to the suffix lexicon. The default is to include all words regardless of count.
  
  For some applications you may want to restrict the suffix lexicon to contain suffixes only for infrequently occurring words. Values of 10 (only include spellings which appear 10 or less times in the training data) or 1 (only include spellings which appear once in the training data) are popular choices.

Field Summary

Fields
Modifier and Type	Field and Description
`protected static int`	`maxSuffixCount` Only use words less than maxSuffixCount to generate suffix lexicon.
`protected static int`	`maxSuffixLength` Maximum and minimum length suffixes to generated.
`protected static int`	`minSuffixLength`
`protected static java.lang.String`	`suffixLexiconFileName` Output suffix lexicon file name.
`protected static java.lang.String`	`trainingDataFileName` Training data file name.
`protected static java.lang.String`	`wordLexiconFileName` Output word lexicon file name.

Constructor Summary

Constructors
Constructor and Description

CreateLexicon()

Constructors
Constructor and Description
`CreateLexicon()`

Method Summary

Methods
Modifier and Type	Method and Description
`protected static void`	`help()` Display brief help.
`protected static boolean`	`initialize(java.lang.String[] args)` Initialize.
`static void`	`main(java.lang.String[] args)` Main program.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - trainingDataFileName
```
protected static java.lang.String trainingDataFileName
```
    Training data file name.
  - wordLexiconFileName
```
protected static java.lang.String wordLexiconFileName
```
    Output word lexicon file name.
  - suffixLexiconFileName
```
protected static java.lang.String suffixLexiconFileName
```
    Output suffix lexicon file name.
  - maxSuffixCount
```
protected static int maxSuffixCount
```
    Only use words less than maxSuffixCount to generate suffix lexicon.
    The default is to use all words regardless of word count.
  - maxSuffixLength
```
protected static int maxSuffixLength
```
    Maximum and minimum length suffixes to generated.
  - minSuffixLength
```
protected static int minSuffixLength
```
- Constructor Detail
  - CreateLexicon
```
public CreateLexicon()
```
- Method Detail
  - help
```
protected static void help()
```
    Display brief help.
  - initialize
```
protected static boolean initialize(java.lang.String[] args)
```
    Initialize.
    
    Parameters:
    args - Command line arguments.
  - main
```
public static void main(java.lang.String[] args)
```
    Main program.
    
    Parameters:
    args - Command line arguments.

Class CreateLexicon

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

trainingDataFileName

wordLexiconFileName

suffixLexiconFileName

maxSuffixCount

maxSuffixLength

minSuffixLength

Constructor Detail

CreateLexicon

Method Detail

help

initialize

main