|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectedu.northwestern.at.utils.IsCloseableObject
edu.northwestern.at.utils.corpuslinguistics.postagger.AbstractPartOfSpeechTagger
edu.northwestern.at.utils.corpuslinguistics.postagger.unigram.UnigramTagger
edu.northwestern.at.utils.corpuslinguistics.postagger.regexp.RegexpTagger
public class RegexpTagger
Regular Expression Part of Speech tagger.
The regular expression part of speech tagger uses a regular expressions to assign a part of speech tag to a spelling.
| Field Summary | |
|---|---|
protected java.util.regex.Matcher[] |
regexpMatchers
|
protected java.util.regex.Pattern[] |
regexpPatterns
Parts of speech for each lexical rule. |
protected java.lang.String[] |
regexpTags
|
| Fields inherited from class edu.northwestern.at.utils.corpuslinguistics.postagger.AbstractPartOfSpeechTagger |
|---|
contextRules, contextualSmoother, dynamicLexicon, lexicalRules, lexicalSmoother, lexicon, logger, partOfSpeechGuesser, postTokenizer, retagger, ruleCorrections, transitionMatrix |
| Constructor Summary | |
|---|---|
RegexpTagger()
Create a suffix tagger. |
|
| Method Summary | |
|---|---|
void |
setLexicalRules(java.lang.String[] lexicalRules)
Set lexical rules for tagging. |
java.lang.String |
tagWord(java.lang.String word)
Tag a single word. |
java.lang.String |
toString()
Return tagger description. |
boolean |
usesLexicalRules()
See if tagger uses lexical rules. |
| Methods inherited from class edu.northwestern.at.utils.corpuslinguistics.postagger.unigram.UnigramTagger |
|---|
tagAdornedWordList, tagWord |
| Methods inherited from class edu.northwestern.at.utils.corpuslinguistics.postagger.AbstractPartOfSpeechTagger |
|---|
clearRuleCorrections, createPartOfSpeechGuesser, getDynamicLexicon, getLexicon, getLexicon, getLogger, getMostCommonTag, getPartOfSpeechGuesser, getRetagger, getRuleCorrections, getTagCount, getTagsForWord, getTransitionMatrix, incrementRuleCorrections, retagWords, setContextRules, setLexicon, setLogger, setPartOfSpeechGuesser, setRetagger, setTransitionMatrix, tagAdornedWordSentence, tagAdornedWordSentences, tagSentence, tagSentences, usesContextRules, usesTransitionProbabilities |
| Methods inherited from class edu.northwestern.at.utils.IsCloseableObject |
|---|
close |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Methods inherited from interface edu.northwestern.at.utils.corpuslinguistics.postagger.PartOfSpeechTagger |
|---|
clearRuleCorrections, getLexicon, getLexicon, getPartOfSpeechGuesser, getRetagger, getRuleCorrections, getTagCount, getTagsForWord, getTransitionMatrix, incrementRuleCorrections, retagWords, setContextRules, setLexicon, setPartOfSpeechGuesser, setRetagger, setTransitionMatrix, tagAdornedWordList, tagAdornedWordSentence, tagAdornedWordSentences, tagSentence, tagSentences, usesContextRules, usesTransitionProbabilities |
| Methods inherited from interface edu.northwestern.at.utils.corpuslinguistics.postagger.CanTagOneWord |
|---|
tagWord |
| Methods inherited from interface edu.northwestern.at.utils.IsCloseable |
|---|
close |
| Field Detail |
|---|
protected java.util.regex.Pattern[] regexpPatterns
protected java.util.regex.Matcher[] regexpMatchers
protected java.lang.String[] regexpTags
| Constructor Detail |
|---|
public RegexpTagger()
| Method Detail |
|---|
public boolean usesLexicalRules()
usesLexicalRules in interface PartOfSpeechTaggerusesLexicalRules in class AbstractPartOfSpeechTagger
public void setLexicalRules(java.lang.String[] lexicalRules)
throws InvalidRuleException
setLexicalRules in interface PartOfSpeechTaggersetLexicalRules in class AbstractPartOfSpeechTaggerlexicalRules - String array of lexical rules.
InvalidRuleException - if a rule is bad.
For the regular expression tagger, each rule takes the form:
regular-expression \t part-of-speech-tag
where "regular expression" is the regular expression and "part-of-speech-tag" is the part of speech tag to assign to a spelling matched by the regular expression. An ascii tab character (\t) separates the pattern from the tag.
public java.lang.String tagWord(java.lang.String word)
tagWord in interface CanTagOneWordtagWord in class UnigramTaggerword - The word.
Applies each of the regular expressions stored in the lexical rules lexicon and returns the tag of associated with the first matching regular expression.
public java.lang.String toString()
toString in class UnigramTagger
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||