public class BrillLexicon
extends java.util.HashMap<java.lang.String,java.util.List<java.lang.String>>
A Brill lexicon is a HashMap
that maps from
a lexical entry (String
) to possible POS categories
(List
.
A Brill lexicon is a simple utf-8 formatted text file containing words and their possible part of speech tags. Each word appears on a separate line. The first token on each line is the word. The remaining tokens are the potential parts of speech for the word, separated by blanks or tab characters. The most commonly occurring part of speech should be the first one listed.
word pos1 pos2 pos3 ...
This type of lexicon format was popularized by Eric Brill's part of speech tagger in the early 1990s.
Constructor and Description |
---|
BrillLexicon(java.net.URL lexiconURL,
java.lang.String encoding)
Create a Brill lexicon.
|
Modifier and Type | Method and Description |
---|---|
void |
saveToFile(java.lang.String fileName,
java.lang.String encoding)
Save Brill lexicon to a file.
|
clear, clone, containsKey, containsValue, entrySet, get, isEmpty, keySet, put, putAll, remove, size, values