public class RegexpTagger extends UnigramTagger implements PartOfSpeechTagger, CanTagOneWord
The regular expression part of speech tagger uses a regular expressions to assign a part of speech tag to a spelling.
Modifier and Type | Field and Description |
---|---|
protected java.util.regex.Matcher[] |
regexpMatchers |
protected java.util.regex.Pattern[] |
regexpPatterns
Parts of speech for each lexical rule.
|
protected java.lang.String[] |
regexpTags |
contextRules, contextualSmoother, dynamicLexicon, lexicalRules, lexicalSmoother, lexicon, logger, partOfSpeechGuesser, postTokenizer, retagger, ruleCorrections, transitionMatrix
Constructor and Description |
---|
RegexpTagger()
Create a suffix tagger.
|
Modifier and Type | Method and Description |
---|---|
void |
setLexicalRules(java.lang.String[] lexicalRules)
Set lexical rules for tagging.
|
java.lang.String |
tagWord(java.lang.String word)
Tag a single word.
|
java.lang.String |
toString()
Return tagger description.
|
boolean |
usesLexicalRules()
See if tagger uses lexical rules.
|
tagAdornedWordList, tagWord
clearRuleCorrections, createPartOfSpeechGuesser, getContextualSmoother, getDynamicLexicon, getLexicalSmoother, getLexicon, getLexicon, getLogger, getMostCommonTag, getPartOfSpeechGuesser, getPostTokenizer, getRetagger, getRuleCorrections, getTagCount, getTagsForWord, getTransitionMatrix, incrementRuleCorrections, retagWords, setContextRules, setContextualSmoother, setLexicalSmoother, setLexicon, setLogger, setPartOfSpeechGuesser, setPostTokenizer, setRetagger, setTransitionMatrix, tagAdornedWordSentence, tagAdornedWordSentences, tagSentence, tagSentences, usesContextRules, usesTransitionProbabilities
close
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
clearRuleCorrections, getContextualSmoother, getLexicalSmoother, getLexicon, getLexicon, getPartOfSpeechGuesser, getPostTokenizer, getRetagger, getRuleCorrections, getTagCount, getTagsForWord, getTransitionMatrix, incrementRuleCorrections, retagWords, setContextRules, setContextualSmoother, setLexicalSmoother, setLexicon, setPartOfSpeechGuesser, setPostTokenizer, setRetagger, setTransitionMatrix, tagAdornedWordList, tagAdornedWordSentence, tagAdornedWordSentences, tagSentence, tagSentences, usesContextRules, usesTransitionProbabilities
tagWord
close
protected java.util.regex.Pattern[] regexpPatterns
protected java.util.regex.Matcher[] regexpMatchers
protected java.lang.String[] regexpTags
public boolean usesLexicalRules()
usesLexicalRules
in interface PartOfSpeechTagger
usesLexicalRules
in class AbstractPartOfSpeechTagger
public void setLexicalRules(java.lang.String[] lexicalRules) throws InvalidRuleException
setLexicalRules
in interface PartOfSpeechTagger
setLexicalRules
in class AbstractPartOfSpeechTagger
lexicalRules
- String array of lexical rules.InvalidRuleException
- if a rule is bad.
For the regular expression tagger, each rule takes the form:
regular-expression \t part-of-speech-tag
where "regular expression" is the regular expression and "part-of-speech-tag" is the part of speech tag to assign to a spelling matched by the regular expression. An ascii tab character (\t) separates the pattern from the tag.
public java.lang.String tagWord(java.lang.String word)
tagWord
in interface CanTagOneWord
tagWord
in class UnigramTagger
word
- The word.Applies each of the regular expressions stored in the lexical rules lexicon and returns the tag of associated with the first matching regular expression.
public java.lang.String toString()
toString
in class UnigramTagger