public class BigramTagger extends AbstractPartOfSpeechTagger implements PartOfSpeechTagger
The bigram part of speech tagger assigns tags to words in a sentence assigning the most probable set of tags as determined by a bigram hidden Markov model given the possible tags of the previous words. The Viterbi algorithm is used to reduce the amount of computation required to find the optimal tag assignments.
Modifier and Type | Field and Description |
---|---|
protected int |
beamSearchRejections
Total number of states rejected by beam search criterion.
|
protected Map2D<java.lang.String,java.lang.String,Probability> |
contextualProbabilities
Contextual probabilities for a word in a sentence.
|
protected boolean |
debug
True for debug output.
|
protected Viterbi |
viterbi
Viterbi trellis for tags and probability scores.
|
contextRules, contextualSmoother, dynamicLexicon, lexicalRules, lexicalSmoother, lexicon, logger, partOfSpeechGuesser, postTokenizer, retagger, ruleCorrections, transitionMatrix
Constructor and Description |
---|
BigramTagger()
Create a bigram tagger.
|
Modifier and Type | Method and Description |
---|---|
protected java.util.List<java.lang.String> |
processWord(int wordIndex,
java.lang.String word,
java.util.List<java.lang.String> previousTags,
java.util.List<java.lang.String> tags)
Process a single word.
|
void |
setLogger(Logger logger)
Set the logger.
|
<T extends AdornedWord> |
tagAdornedWordList(java.util.List<T> taggedSentence)
Tag a sentence.
|
java.util.List<java.util.List<AdornedWord>> |
tagSentences(java.util.List<java.util.List<java.lang.String>> sentences)
Tag a list of sentences.
|
java.lang.String |
toString()
Return tagger description.
|
boolean |
usesTransitionProbabilities()
See if tagger uses a probability transition matrix.
|
clearRuleCorrections, createPartOfSpeechGuesser, getContextualSmoother, getDynamicLexicon, getLexicalSmoother, getLexicon, getLexicon, getLogger, getMostCommonTag, getPartOfSpeechGuesser, getPostTokenizer, getRetagger, getRuleCorrections, getTagCount, getTagsForWord, getTransitionMatrix, incrementRuleCorrections, retagWords, setContextRules, setContextualSmoother, setLexicalRules, setLexicalSmoother, setLexicon, setPartOfSpeechGuesser, setPostTokenizer, setRetagger, setTransitionMatrix, tagAdornedWordSentence, tagAdornedWordSentences, tagSentence, usesContextRules, usesLexicalRules
close
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
clearRuleCorrections, getContextualSmoother, getLexicalSmoother, getLexicon, getLexicon, getPartOfSpeechGuesser, getPostTokenizer, getRetagger, getRuleCorrections, getTagCount, getTagsForWord, getTransitionMatrix, incrementRuleCorrections, retagWords, setContextRules, setContextualSmoother, setLexicalRules, setLexicalSmoother, setLexicon, setPartOfSpeechGuesser, setPostTokenizer, setRetagger, setTransitionMatrix, tagAdornedWordSentence, tagAdornedWordSentences, tagSentence, usesContextRules, usesLexicalRules
close
protected boolean debug
protected Map2D<java.lang.String,java.lang.String,Probability> contextualProbabilities
protected int beamSearchRejections
protected Viterbi viterbi
public boolean usesTransitionProbabilities()
usesTransitionProbabilities
in interface PartOfSpeechTagger
usesTransitionProbabilities
in class AbstractPartOfSpeechTagger
public java.util.List<java.util.List<AdornedWord>> tagSentences(java.util.List<java.util.List<java.lang.String>> sentences)
tagSentences
in interface PartOfSpeechTagger
tagSentences
in class AbstractPartOfSpeechTagger
sentences
- The list of sentences.
The sentences are a List
of
List
s of words to be tagged.
Each sentence is represented as a list of
words. The output is a list of
AdornedWord
s.
public <T extends AdornedWord> java.util.List<T> tagAdornedWordList(java.util.List<T> taggedSentence)
tagAdornedWordList
in interface PartOfSpeechTagger
tagAdornedWordList
in class AbstractPartOfSpeechTagger
taggedSentence
- The sentence as an
AdornedWord
.AdornedWord
of the words in the sentence tagged with
parts of speech.
The input sentence is a List
of
string words to be tagged. The output is
AdornedWord
of the words with parts of speech added.
protected java.util.List<java.lang.String> processWord(int wordIndex, java.lang.String word, java.util.List<java.lang.String> previousTags, java.util.List<java.lang.String> tags)
wordIndex
- Index of word in sentence (starts at 0).word
- Word being processed.previousTags
- The previous word's tags.tags
- The current word's tags.public void setLogger(Logger logger)
setLogger
in interface UsesLogger
setLogger
in class AbstractPartOfSpeechTagger
logger
- The logger.public java.lang.String toString()
toString
in class java.lang.Object