public class BrillLexicon extends java.util.HashMap<java.lang.String,java.util.List<java.lang.String>>
A Brill lexicon is a
HashMap that maps from
a lexical entry (
String) to possible POS categories
A Brill lexicon is a simple utf-8 formatted text file containing words and their possible part of speech tags. Each word appears on a separate line. The first token on each line is the word. The remaining tokens are the potential parts of speech for the word, separated by blanks or tab characters. The most commonly occurring part of speech should be the first one listed.
word pos1 pos2 pos3 ...
This type of lexicon format was popularized by Eric Brill's part of speech tagger in the early 1990s.
|Constructor and Description|
Create a Brill lexicon.
|Modifier and Type||Method and Description|
Save Brill lexicon to a file.
clear, clone, containsKey, containsValue, entrySet, get, isEmpty, keySet, put, putAll, remove, size, values
finalize, getClass, notify, notifyAll, wait, wait, wait
public BrillLexicon(java.net.URL lexiconURL, java.lang.String encoding) throws java.io.IOException
lexiconURL- URL for the file containing the lexicon.
encoding- Character encoding of lexicon file text.