public class SimpleRuleBasedTagger extends UnigramTagger implements PartOfSpeechTagger, PartOfSpeechRetagger
The simple rule-based part of speech tagger assigns the most commonly occurring part of speech to all words and then applies a small set of contextual rules to "fix up" the tagging. It's kind of a "Brill light."
This simple tagger is useful when very fast tagging without high accuracy is useful, e.g., in sentence splitting.
contextRules, contextualSmoother, dynamicLexicon, lexicalRules, lexicalSmoother, lexicon, logger, partOfSpeechGuesser, postTokenizer, retagger, ruleCorrections, transitionMatrix
Constructor and Description |
---|
SimpleRuleBasedTagger()
Create a simple rule-based tagger.
|
Modifier and Type | Method and Description |
---|---|
boolean |
getCanAddOrDeleteWords()
Can retagger add or delete words in the original sentence?
|
<T extends AdornedWord> |
retagSentence(java.util.List<T> sentence)
Retag a sentence.
|
<T extends AdornedWord> |
retagWords(java.util.List<T> taggedSentence)
Retag words in a tagged sentence.
|
void |
setCanAddOrDeleteWords(boolean canAddOrDeleteWords)
Can retagger add or delete words in the original sentence?
|
java.util.List<AdornedWord> |
tagSentence(java.util.List<java.lang.String> sentence)
Tag a sentence.
|
java.lang.String |
toString()
Return tagger description.
|
tagAdornedWordList, tagWord, tagWord
clearRuleCorrections, createPartOfSpeechGuesser, getContextualSmoother, getDynamicLexicon, getLexicalSmoother, getLexicon, getLexicon, getLogger, getMostCommonTag, getPartOfSpeechGuesser, getPostTokenizer, getRetagger, getRuleCorrections, getTagCount, getTagsForWord, getTransitionMatrix, incrementRuleCorrections, setContextRules, setContextualSmoother, setLexicalRules, setLexicalSmoother, setLexicon, setLogger, setPartOfSpeechGuesser, setPostTokenizer, setRetagger, setTransitionMatrix, tagAdornedWordSentence, tagAdornedWordSentences, tagSentences, usesContextRules, usesLexicalRules, usesTransitionProbabilities
close
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
clearRuleCorrections, getContextualSmoother, getLexicalSmoother, getLexicon, getLexicon, getPartOfSpeechGuesser, getPostTokenizer, getRetagger, getRuleCorrections, getTagCount, getTagsForWord, getTransitionMatrix, incrementRuleCorrections, setContextRules, setContextualSmoother, setLexicalRules, setLexicalSmoother, setLexicon, setPartOfSpeechGuesser, setPostTokenizer, setRetagger, setTransitionMatrix, tagAdornedWordList, tagAdornedWordSentence, tagAdornedWordSentences, tagSentences, usesContextRules, usesLexicalRules, usesTransitionProbabilities
close
public SimpleRuleBasedTagger()
public java.util.List<AdornedWord> tagSentence(java.util.List<java.lang.String> sentence)
tagSentence
in interface PartOfSpeechTagger
tagSentence
in class AbstractPartOfSpeechTagger
sentence
- The sentence as a list of string words.AdornedWord
of the words in the sentence tagged with
parts of speech.
The input sentence is a List
of
string words to be tagged. The output is
AdornedWord
of the words with parts of speech added.
public <T extends AdornedWord> java.util.List<T> retagWords(java.util.List<T> taggedSentence)
retagWords
in interface PartOfSpeechTagger
retagWords
in class AbstractPartOfSpeechTagger
taggedSentence
- The tagged sentence.This method applies the short list of fixup rules. The resultant tagging is crude but good enough for tasks like sentence boundary detection.
public <T extends AdornedWord> java.util.List<T> retagSentence(java.util.List<T> sentence)
retagSentence
in interface PartOfSpeechRetagger
sentence
- The sentence as an
AdornedWord
.public boolean getCanAddOrDeleteWords()
getCanAddOrDeleteWords
in interface PartOfSpeechRetagger
public void setCanAddOrDeleteWords(boolean canAddOrDeleteWords)
setCanAddOrDeleteWords
in interface PartOfSpeechRetagger
canAddOrDeleteWords
- true if retagger can add or
delete words.
Ignored here.
public java.lang.String toString()
toString
in class UnigramTagger