Northwestern University Information Technology
A MorphAdorner word lexicon for a corpus stores all the spellings for words which appear in the corpus, along with the lemmata and parts of speech for each spelling. Each lexicon entry also provides the number of times that spelling appears, both overall as well as broken down by part of speech. MorphAdorner currently provides two English language lexicons, one for Early Modern English, and one for Nineteenth Century Fiction.
MorphAdorner augments the lexicons with auxiliary lists of words which do not appear in the corpus. These include extensive lists of proper names, common foreign words, and combinations of existing words with parts of speech that do not appear in the corpus. These are assigned an "occurrence" count of one. These auxiliary lists improve the ability of MorphAdorner to adorn text with parts of speech and recognize proper names and places.
Lexicon files are plain text files encoded in utf-8 format. Each line in the lexicon file takes the following form:
spelling countspelling pos1 lemma1 countpos1 pos2 lemma2 countpos2 ...
These fields are separated by tab characters.
The raw counts are stored rather than probabilities so that new training data can be used to update the lexicon easily, and so that individual part of speech taggers can apply different methods of count smoothing.
Following are a few lines from the nineteenth century fiction lexicon.
die 1660 vvi die 1164 n1 die 22 vvb die 474
die-away 2 j die-away 2
died 803 vvd die 607 vvn die 196
For example, the spelling died appears 803 times in the training data. It appears 607 times as the part of speech vvn and 196 times as the part of speech vvn. Its lemma in both cases is die.
When lemmata are not available, an "*' appears in the lemma field. Suffix lexicons contains "*" for all lemmata, for example.
You can try looking up spellings in MorphAdorner's Lexicon lookup online.
|Announcements and News
|Announcements and news about changes to MorphAdorner
|Documentation for using MorphAdorner
|Downloading and installing the MorphAdorner client and server software
|Glossary of MorphAdorner terms
|Natural language processing references
|Licenses for MorphAdorner and Associated Software
|Online examples of MorphAdorner Server facilities.
|Slides from talks about MorphAdorner.
|Technical information for programmers using MorphAdorner
Academic Technologies and Research Services,
NU Library 2East, 1970 Campus Drive Evanston, IL 60208. |