public abstract class AbstractPartOfSpeechTagger extends IsCloseableObject implements PartOfSpeechTagger, IsCloseable, UsesLexicon, UsesLogger
Provides default implementations for all of the PartOfSpeech interface methods. To create a new PartOfSpeech tagger, extend this class and override methods as needed. You must override the tagSentence method as a minimum.
Modifier and Type | Field and Description |
---|---|
protected java.lang.String[] |
contextRules
Context rules.
|
protected ContextualSmoother |
contextualSmoother
Contextual smoother.
|
protected Lexicon |
dynamicLexicon
Dynamic lexicon built on-the-fly for words not in static lexicon.
|
protected java.lang.String[] |
lexicalRules
Lexical rules.
|
protected LexicalSmoother |
lexicalSmoother
Lexical smoother.
|
protected Lexicon |
lexicon
Static lexicon used by tagger.
|
protected Logger |
logger
Logger used for output.
|
protected PartOfSpeechGuesser |
partOfSpeechGuesser
Part of speech guesser for words not in lexicon.
|
protected PostTokenizer |
postTokenizer
PostTokenizer for mapping raw tokens to initial spellings.
|
protected PartOfSpeechRetagger |
retagger
Fixup retagger.
|
protected int |
ruleCorrections
Number of corrections applied by rules.
|
protected TransitionMatrix |
transitionMatrix
Transition matrix used by tagger.
|
Constructor and Description |
---|
AbstractPartOfSpeechTagger()
Create tagger.
|
Modifier and Type | Method and Description |
---|---|
void |
clearRuleCorrections()
Clear count of successful rule applications.
|
protected void |
createPartOfSpeechGuesser()
Create a part of speech guesser.
|
ContextualSmoother |
getContextualSmoother()
Get the contextual smoother.
|
Lexicon |
getDynamicLexicon()
Get the dynamic word lexicon.
|
LexicalSmoother |
getLexicalSmoother()
Get the lexical smoother.
|
Lexicon |
getLexicon()
Get the static word lexicon.
|
Lexicon |
getLexicon(java.lang.String word)
Get the lexicon associated with a specific word.
|
Logger |
getLogger()
Get the logger.
|
java.lang.String |
getMostCommonTag(java.lang.String word)
Get the most common tag for a word.
|
PartOfSpeechGuesser |
getPartOfSpeechGuesser()
Get part of speech guesser.
|
PostTokenizer |
getPostTokenizer()
Get the postTokenizer.
|
PartOfSpeechRetagger |
getRetagger()
Get part of speech retagger.
|
int |
getRuleCorrections()
Get count of successful rule applications.
|
int |
getTagCount(java.lang.String word,
java.lang.String tag)
Get count of times a word appears with a given tag.
|
java.util.List<java.lang.String> |
getTagsForWord(java.lang.String word)
Get potential part of speech tags for a word.
|
TransitionMatrix |
getTransitionMatrix()
Get tag transition probabilities matrix.
|
void |
incrementRuleCorrections()
Increment count of successful rule applications.
|
<T extends AdornedWord> |
retagWords(java.util.List<T> taggedSentence)
Retag words in a tagged sentence.
|
void |
setContextRules(java.lang.String[] contextRules)
Set context rules for tagging.
|
void |
setContextualSmoother(ContextualSmoother contextualSmoother)
Set the contextual smoother.
|
void |
setLexicalRules(java.lang.String[] lexicalRules)
Set lexical rules for tagging.
|
void |
setLexicalSmoother(LexicalSmoother lexicalSmoother)
Set the lexical smoother.
|
void |
setLexicon(Lexicon lexicon)
Set the lexicon.
|
void |
setLogger(Logger logger)
Set the logger.
|
void |
setPartOfSpeechGuesser(PartOfSpeechGuesser partOfSpeechGuesser)
Set part of speech guesser.
|
void |
setPostTokenizer(PostTokenizer postTokenizer)
Set the postTokenizer.
|
void |
setRetagger(PartOfSpeechRetagger retagger)
Set part of speech retagger.
|
void |
setTransitionMatrix(TransitionMatrix transitionMatrix)
Set tag transition probabilities matrix.
|
abstract <T extends AdornedWord> |
tagAdornedWordList(java.util.List<T> sentence)
Tag a list of adorned words.
|
<T extends AdornedWord> |
tagAdornedWordSentence(java.util.List<T> sentence,
java.util.Set<java.lang.String> regIDSet)
Tag a sentence of adorned words.
|
<T extends AdornedWord> |
tagAdornedWordSentences(java.util.List<java.util.List<T>> sentences,
java.util.Set<java.lang.String> regIDSet)
Tag a list of sentences.
|
java.util.List<AdornedWord> |
tagSentence(java.util.List<java.lang.String> sentence)
Tag a sentence.
|
java.util.List<java.util.List<AdornedWord>> |
tagSentences(java.util.List<java.util.List<java.lang.String>> sentences)
Tag a list of sentences.
|
boolean |
usesContextRules()
See if tagger uses context rules.
|
boolean |
usesLexicalRules()
See if tagger uses lexical rules.
|
boolean |
usesTransitionProbabilities()
See if tagger uses a probability transition matrix.
|
close
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
close
protected Lexicon lexicon
protected Lexicon dynamicLexicon
protected TransitionMatrix transitionMatrix
protected java.lang.String[] contextRules
protected java.lang.String[] lexicalRules
protected LexicalSmoother lexicalSmoother
protected ContextualSmoother contextualSmoother
protected PartOfSpeechRetagger retagger
protected PartOfSpeechGuesser partOfSpeechGuesser
protected PostTokenizer postTokenizer
protected int ruleCorrections
protected Logger logger
public Logger getLogger()
getLogger
in interface UsesLogger
public void setLogger(Logger logger)
setLogger
in interface UsesLogger
logger
- The logger.public boolean usesContextRules()
usesContextRules
in interface PartOfSpeechTagger
public boolean usesLexicalRules()
usesLexicalRules
in interface PartOfSpeechTagger
public boolean usesTransitionProbabilities()
usesTransitionProbabilities
in interface PartOfSpeechTagger
public void setContextRules(java.lang.String[] contextRules) throws InvalidRuleException
setContextRules
in interface PartOfSpeechTagger
contextRules
- String array of context rules.InvalidRuleException
- if a rule is bad.
For taggers which do not use context rules, this is a no-op.
public void setLexicalRules(java.lang.String[] lexicalRules) throws InvalidRuleException
setLexicalRules
in interface PartOfSpeechTagger
lexicalRules
- String array of lexical rules.InvalidRuleException
- if a rule is bad.
For taggers which do not use lexical rules, this is a no-op.
public Lexicon getLexicon()
getLexicon
in interface UsesLexicon
getLexicon
in interface PartOfSpeechTagger
public Lexicon getDynamicLexicon()
public Lexicon getLexicon(java.lang.String word)
getLexicon
in interface PartOfSpeechTagger
word
- The word whose source lexicon is sought.Most words do not have a source lexicon defined, in which case they come from the main static word lexicon. Usually only words derived by a suffix analysis have a source lexicon defined, which will of course be the suffix lexicon.
public void setLexicon(Lexicon lexicon)
setLexicon
in interface UsesLexicon
setLexicon
in interface PartOfSpeechTagger
lexicon
- Lexicon used for tagging.public TransitionMatrix getTransitionMatrix()
getTransitionMatrix
in interface PartOfSpeechTagger
public void setTransitionMatrix(TransitionMatrix transitionMatrix)
setTransitionMatrix
in interface PartOfSpeechTagger
transitionMatrix
- Tag probabilities transition matrix.
For taggers which do not use transition matrices, this is a no-op.
public PartOfSpeechGuesser getPartOfSpeechGuesser()
getPartOfSpeechGuesser
in interface PartOfSpeechTagger
public void setPartOfSpeechGuesser(PartOfSpeechGuesser partOfSpeechGuesser)
setPartOfSpeechGuesser
in interface PartOfSpeechTagger
partOfSpeechGuesser
- The part of speech guesser.public PartOfSpeechRetagger getRetagger()
getRetagger
in interface PartOfSpeechTagger
public void setRetagger(PartOfSpeechRetagger retagger)
setRetagger
in interface PartOfSpeechTagger
retagger
- The part of speech retagger.public PostTokenizer getPostTokenizer()
getPostTokenizer
in interface PartOfSpeechTagger
public void setPostTokenizer(PostTokenizer postTokenizer)
setPostTokenizer
in interface PartOfSpeechTagger
postTokenizer
- The postTokenizer.public ContextualSmoother getContextualSmoother()
getContextualSmoother
in interface PartOfSpeechTagger
public void setContextualSmoother(ContextualSmoother contextualSmoother)
setContextualSmoother
in interface PartOfSpeechTagger
contextualSmoother
- The contextual smoother.public LexicalSmoother getLexicalSmoother()
getLexicalSmoother
in interface PartOfSpeechTagger
public void setLexicalSmoother(LexicalSmoother lexicalSmoother)
setLexicalSmoother
in interface PartOfSpeechTagger
lexicalSmoother
- The lexical smoother.public java.util.List<java.lang.String> getTagsForWord(java.lang.String word)
getTagsForWord
in interface PartOfSpeechTagger
word
- The word whose part of speech tags we want.When the word does not appear in the lexicon, the part of speech guesser is used to determine the tags based upon features of the word (suffix analysis, etc.).
public int getTagCount(java.lang.String word, java.lang.String tag)
getTagCount
in interface PartOfSpeechTagger
word
- The word.tag
- The part of speech tag.When the word does not appear in the lexicon, the part of speech guesser is used to compute a count based upon features of the word (suffix analysis, etc.).
public java.lang.String getMostCommonTag(java.lang.String word)
word
- The word.public java.util.List<java.util.List<AdornedWord>> tagSentences(java.util.List<java.util.List<java.lang.String>> sentences)
tagSentences
in interface PartOfSpeechTagger
sentences
- The list of sentences.
The sentences are a List
of
List
s of words to be tagged.
Each sentence is represented as a list of
words. The output is a list of
AdornedWord
s.
public <T extends AdornedWord> java.util.List<java.util.List<T>> tagAdornedWordSentences(java.util.List<java.util.List<T>> sentences, java.util.Set<java.lang.String> regIDSet)
tagAdornedWordSentences
in interface PartOfSpeechTagger
sentences
- The list of sentences.regIDSet
- Set of word IDs of words requiring special handling.
The sentences are a List
of
List
s of adorned words to be tagged.
Each sentence is represented as a list of
words. The output is a list of
AdornedWord
s.
public <T extends AdornedWord> java.util.List<T> retagWords(java.util.List<T> taggedSentence)
retagWords
in interface PartOfSpeechTagger
taggedSentence
- The tagged sentence.This method calls the retagger, if any. If no retagger is defined, the input tagged sentence is returned unchanged. Override this method to add custom retagging without the use of a retagger.
public void clearRuleCorrections()
clearRuleCorrections
in interface PartOfSpeechTagger
public void incrementRuleCorrections()
incrementRuleCorrections
in interface PartOfSpeechTagger
public int getRuleCorrections()
getRuleCorrections
in interface PartOfSpeechTagger
protected void createPartOfSpeechGuesser()
public java.util.List<AdornedWord> tagSentence(java.util.List<java.lang.String> sentence)
tagSentence
in interface PartOfSpeechTagger
sentence
- The sentence as a list of string words.AdornedWord
of the words in the sentence tagged with
parts of speech.
The input sentence is a List
of
string words to be tagged. The output is
AdornedWord
of the words with parts of speech added.
public <T extends AdornedWord> java.util.List<T> tagAdornedWordSentence(java.util.List<T> sentence, java.util.Set<java.lang.String> regIDSet)
tagAdornedWordSentence
in interface PartOfSpeechTagger
sentence
- The sentence as a list of adorned words.regIDSet
- Set of word IDs of words requiring special handling.AdornedWord
of the words in the sentence tagged with
parts of speech.
The input sentence is a List
of
adorned words to be tagged. The output is
the same list with spellings, parts of speech, etc. added/modified.
public abstract <T extends AdornedWord> java.util.List<T> tagAdornedWordList(java.util.List<T> sentence)
tagAdornedWordList
in interface PartOfSpeechTagger
sentence
- The sentence as an
AdornedWord
.