public class Viterbi
extends java.lang.Object
The Viterbi trellis is a (# of words) x (# of part-of-speech-tags) matrix containing transition probability states from one word/tag to the next for each tag. Because this is a very sparse matrix, we use a HashMap2D to hold the probability values. In addition, we use another HashMap2D to hold the back trace tags. This allows use to produce the optimal (most probable) path from the last tag in a sentence back to the first.
To improve performance, we define a beam width which allows us to eliminate trellis states which are highly unlikely to be part of the optimal tag path. The default beam width s 1000 which appears to work well in practice.
Modifier and Type | Field and Description |
---|---|
protected int |
beamSearchRejections
Count of tags rejected by beam search.
|
protected double |
beamWidth
Beam width for beam search.
|
protected Logger |
logger
Logger used for output.
|
protected Map2D<java.lang.Integer,java.lang.String,java.lang.String> |
tracebackTags
Viterbi traceback for tags.
|
protected Map2D<java.lang.Integer,java.lang.String,Probability> |
trellis
Viterbi probability trellis.
|
Constructor and Description |
---|
Viterbi()
Create Viterbi object.
|
Modifier and Type | Method and Description |
---|---|
double |
beamWidth()
Get the beam width.
|
void |
beamWidth(double beamWidth)
Set the beam width.
|
int |
getBeamSearchRejections()
Return number of entries rejected by beam search.
|
Logger |
getLogger()
Get the logger.
|
Probability |
getScore(int index,
java.lang.String tag)
Get probability value for a specified word index and tag.
|
java.lang.String |
getTracebackTag(int index,
java.lang.String tag)
Get traceback tag for a specified tag and word index.
|
java.util.List<java.lang.String> |
optimalTags(int nWords,
java.util.List<java.lang.String> tags)
Get optimal set of tags via backtracking.
|
protected java.util.List<java.lang.String> |
pruneTags(int wordIndex,
java.util.List<java.lang.String> tags,
Probability bestScore)
Prune tags using beam search.
|
void |
reset()
Reset viterbi to clean state.
|
void |
setLogger(Logger logger)
Set the logger.
|
void |
setScore(int index,
java.lang.String tag,
java.lang.String tracebackTag,
Probability score)
Store Viterbi score and traceback tag for a specified word index.
|
java.util.List<java.lang.String> |
updateScore(int wordIndex,
Probability[] lexicalProbs,
Map2D contextualProbs,
java.util.List<java.lang.String> tags,
java.util.List<java.lang.String> prevTags)
Perform Viterbi scoring for bigram.
|
java.util.List<java.lang.String> |
updateScore(int wordIndex,
Probability[] lexicalProbs,
Map3D contextualProbs,
java.util.List<java.lang.String> tags,
java.util.List<java.lang.String> prevTags,
java.util.List<java.lang.String> prevPrevTags)
Perform Viterbi scoring for trigram.
|
protected Map2D<java.lang.Integer,java.lang.String,Probability> trellis
protected Map2D<java.lang.Integer,java.lang.String,java.lang.String> tracebackTags
protected double beamWidth
protected int beamSearchRejections
protected Logger logger
public void reset()
public Probability getScore(int index, java.lang.String tag)
index
- Word index.tag
- Part of speech tag.public java.lang.String getTracebackTag(int index, java.lang.String tag)
index
- Word index.tag
- Part of speech tag.public void setScore(int index, java.lang.String tag, java.lang.String tracebackTag, Probability score)
index
- Word index.tag
- Part of speech tag.tracebackTag
- Traceback tag to store.score
- Score to store.public java.util.List<java.lang.String> updateScore(int wordIndex, Probability[] lexicalProbs, Map2D contextualProbs, java.util.List<java.lang.String> tags, java.util.List<java.lang.String> prevTags)
wordIndex
- Word index for current word.lexicalProbs
- Array of lexical probabilities.
Entries match corresponding tags
in "tags" parameter.contextualProbs
- HashMap2D mapping words and tags
to contextual probabilities.tags
- Possible tags for current word.prevTags
- Possible tags for previous word.public java.util.List<java.lang.String> updateScore(int wordIndex, Probability[] lexicalProbs, Map3D contextualProbs, java.util.List<java.lang.String> tags, java.util.List<java.lang.String> prevTags, java.util.List<java.lang.String> prevPrevTags)
wordIndex
- Word index for current word.lexicalProbs
- Array of lexical probabilities.
Entries match corresponding tags
in "tags" parameter.contextualProbs
- HashMap3D mapping words and tags
to contextual probabilities.tags
- Possible tags for current word.prevTags
- Possible tags for previous word.prevPrevTags
- Possible tags for previous word of
previous word.protected java.util.List<java.lang.String> pruneTags(int wordIndex, java.util.List<java.lang.String> tags, Probability bestScore)
wordIndex
- The word index.tags
- The tags to prune.bestScore
- The best score for this word and set of tags.Compares the ratio of the best score scross all the tags to each individual tag's score. Tags with a ratio larger than the beam width are removed from further consideration in succeeding states.
public java.util.List<java.lang.String> optimalTags(int nWords, java.util.List<java.lang.String> tags)
nWords
- Number of words.tags
- Final state tags.public int getBeamSearchRejections()
public double beamWidth()
public void beamWidth(double beamWidth)
beamWidth
- The beam width.public Logger getLogger()
public void setLogger(Logger logger)
logger
- The logger.