|
Assume that the part of speech tag for a
word depends only upon the previous one or two tags, and
that the probability of this tag does not depend upon the probabilities
of subsequent tags. How do we find the most probable sequence of
tags corresponding to a particular sequence of words?
We can look at the sequence of part of speech tags for words
as an instance of a Hidden Markov Model HMM).
Finding the most probable part of speech tag sequence
amounts to finding the most probable sequence of states
which the Hidden Markov Model traverses for a particular
sequence of words.
The Trigram Tagger in MorphAdorner seeks the most likely part of speech
tag for a word given information about the previous two words.
More precisely, the tag sequence t1, t2, ...,
tn -- corresponding to the word sequence
w1, w2, ..., wn --
is sought which maximizes the following expression:
P(t1)P(t2|t1)PRODUCT(i=3 to n)
P(ti|ti-2,ti-1)PRODUCT(i=1 to n)
P(wi|ti)
where
| P(t) |
Contextual (tag transition) probability |
| P(w|t) |
Lexical (word emission) probability |
Each P(wi|ti) is estimated using the
maximum likelihood estimator:
PMLE(ti|ti-2,ti-1) =
C(ti-2,ti-1,ti) /
C(ti-2,ti-1)
where C(x) is the observed single or joint frequency for the words
or tags. To account for "holes" in the frequencies, where some
possible combinations are not observed, we can compute smoothed
probabilities which reduce the maximum likelihood estimates a little
bit to allow a bit of the overall probability to be assigned to
unobserved combinations. By default MorphAdorner uses a simple
additive smoother which adds small positive values to the
numerator and denominator of the probability estimate above.
The numerator value is a small constant such as 0.05, while the
denominator value is the numerator constant multiplied by the
size of the word lexicon L.
Psmoothed(ti|ti-2,ti-1) =
(C(ti-2,ti-1,ti) + 0.05) /
(C(ti-2,ti-1) + (0.05 * L))
This works well when the training
data size is large (as it is for MorphAdorner). For smaller
training sets, deleted interpolation may prove more effective.
MorphAdorner provides a deleted interpolation smoother as an
option.
Each P(wi|ti) is estimated using the
maximum likelihood estimator:
PMLE(wi|ti) =
C(wi , ti) / C(ti)
Again, MorphAdorner smooths the maximum likelihood estimates
using additive smoothing. This time 0.5 is used as the additive
numerator value, and the denominator multiplier is K,
the number of potential part of speech tags for the word wi.
This is commonly called Lidstone smoothing.
Psmoothed(wi|ti) =
(C(wi , ti) + 0.5) /
(C(ti) + (0.5 * K))
It is possible but inefficient to calculate the probabilities using
complete enumeration of all possible values of
t1, t2, ..., tn.
In the worst case the time complexity is nT where T
is the number of part of speech tags in the tag set.
MorphAdorner applies the Viterbi algorithm, a dynamic programming
algorithm to determine the optimal subpaths for each state in the
HMM model as the algorithm traverses the model. Subpaths
other than the most probable are discarded. MorphAdorner also restricts
the search path even further using a beam search which discards paths
whose probabilities are too small compared to a specified tolerance
value to contribute significantly to the joint probability.
|