public class TrigramTagger extends AbstractPartOfSpeechTagger implements PartOfSpeechTagger
The trigram part of speech tagger assigns tags to words in a sentence assigning the most probable set of tags as determined by a trigram hidden Markov model given the possible tags of the previous words. The Viterbi algorithm is used to reduce the amount of computation required to find the optimal tag assignments.
Modifier and Type | Field and Description |
---|---|
protected int |
beamSearchRejections
Total number of states rejected by beam search criterion.
|
protected Map3D<java.lang.String,java.lang.String,java.lang.String,Probability> |
contextualProbabilities
Contextual probabilities for a word in a sentence.
|
protected boolean |
debug
True for debug output.
|
protected int |
linesTagged
Count of lines tagged.
|
protected Viterbi |
viterbi
Viterbi trellis for tags and probability scores.
|
protected int |
wordsTagged
Count of words tagged.
|
contextRules, contextualSmoother, dynamicLexicon, lexicalRules, lexicalSmoother, lexicon, logger, partOfSpeechGuesser, postTokenizer, retagger, ruleCorrections, transitionMatrix
Constructor and Description |
---|
TrigramTagger()
Create a trigram tagger.
|
Modifier and Type | Method and Description |
---|---|
protected java.util.List<java.lang.String> |
processWord(int wordIndex,
java.lang.String word,
java.util.List<java.lang.String> previousPreviousTags,
java.util.List<java.lang.String> previousTags,
java.util.List<java.lang.String> tags)
Process a single word.
|
protected void |
reportEndOfTaggingStats()
Report end of tagging statistics.
|
void |
setLogger(Logger logger)
Set the logger.
|
<T extends AdornedWord> |
tagAdornedWordList(java.util.List<T> taggedSentence)
Tag a sentence comprised of a list of adorned words.
|
<T extends AdornedWord> |
tagAdornedWordSentences(java.util.List<java.util.List<T>> sentences,
java.util.Set<java.lang.String> regIDSet)
Tag a list of sentences containing adorned words.
|
java.util.List<java.util.List<AdornedWord>> |
tagSentences(java.util.List<java.util.List<java.lang.String>> sentences)
Tag a list of sentences.
|
java.lang.String |
toString()
Return tagger description.
|
boolean |
usesTransitionProbabilities()
See if tagger uses a probability transition matrix.
|
clearRuleCorrections, createPartOfSpeechGuesser, getContextualSmoother, getDynamicLexicon, getLexicalSmoother, getLexicon, getLexicon, getLogger, getMostCommonTag, getPartOfSpeechGuesser, getPostTokenizer, getRetagger, getRuleCorrections, getTagCount, getTagsForWord, getTransitionMatrix, incrementRuleCorrections, retagWords, setContextRules, setContextualSmoother, setLexicalRules, setLexicalSmoother, setLexicon, setPartOfSpeechGuesser, setPostTokenizer, setRetagger, setTransitionMatrix, tagAdornedWordSentence, tagSentence, usesContextRules, usesLexicalRules
close
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
clearRuleCorrections, getContextualSmoother, getLexicalSmoother, getLexicon, getLexicon, getPartOfSpeechGuesser, getPostTokenizer, getRetagger, getRuleCorrections, getTagCount, getTagsForWord, getTransitionMatrix, incrementRuleCorrections, retagWords, setContextRules, setContextualSmoother, setLexicalRules, setLexicalSmoother, setLexicon, setPartOfSpeechGuesser, setPostTokenizer, setRetagger, setTransitionMatrix, tagAdornedWordSentence, tagSentence, usesContextRules, usesLexicalRules
close
protected boolean debug
protected Map3D<java.lang.String,java.lang.String,java.lang.String,Probability> contextualProbabilities
protected int beamSearchRejections
protected Viterbi viterbi
protected int linesTagged
protected int wordsTagged
public boolean usesTransitionProbabilities()
usesTransitionProbabilities
in interface PartOfSpeechTagger
usesTransitionProbabilities
in class AbstractPartOfSpeechTagger
protected void reportEndOfTaggingStats()
public java.util.List<java.util.List<AdornedWord>> tagSentences(java.util.List<java.util.List<java.lang.String>> sentences)
tagSentences
in interface PartOfSpeechTagger
tagSentences
in class AbstractPartOfSpeechTagger
sentences
- The list of sentences.
The sentences are a List
of
List
s of words to be tagged.
Each sentence is represented as a list of
words.
The sentences are a List
of
List
s of words to be tagged.
Each sentence is represented as a list of
words. The output is a list of
AdornedWord
s.
public <T extends AdornedWord> java.util.List<java.util.List<T>> tagAdornedWordSentences(java.util.List<java.util.List<T>> sentences, java.util.Set<java.lang.String> regIDSet)
tagAdornedWordSentences
in interface PartOfSpeechTagger
tagAdornedWordSentences
in class AbstractPartOfSpeechTagger
sentences
- The list of sentences.regIDSet
- Word IDs of words requiring special handling.
The sentences are a List
of
List
s of adorn words to be tagged.
Each sentence is represented as a list of
words.
The sentences are a List
of
List
s of adorned words to be tagged.
Each sentence is represented as a list of
words. The output is a list of
AdornedWord
s.
public <T extends AdornedWord> java.util.List<T> tagAdornedWordList(java.util.List<T> taggedSentence)
tagAdornedWordList
in interface PartOfSpeechTagger
tagAdornedWordList
in class AbstractPartOfSpeechTagger
taggedSentence
- The sentence as an
AdornedWord
.AdornedWord
of the words in the sentence tagged with
parts of speech.
The input sentence is a
AdornedWord
of words to be tagged. The output is the same list of words with
parts of speech added.
protected java.util.List<java.lang.String> processWord(int wordIndex, java.lang.String word, java.util.List<java.lang.String> previousPreviousTags, java.util.List<java.lang.String> previousTags, java.util.List<java.lang.String> tags)
wordIndex
- Index of word in sentence
(starts at 0).word
- Word being processed.previousPreviousTags
- The previous word's previous
word's tags.previousTags
- The previous word's tags.tags
- The current word's tags.public void setLogger(Logger logger)
setLogger
in interface UsesLogger
setLogger
in class AbstractPartOfSpeechTagger
logger
- The logger.public java.lang.String toString()
toString
in class java.lang.Object