RegexpTagger (MorphAdorner)

java.lang.Object
- edu.northwestern.at.utils.IsCloseableObject
- - edu.northwestern.at.morphadorner.corpuslinguistics.postagger.AbstractPartOfSpeechTagger
  - - edu.northwestern.at.morphadorner.corpuslinguistics.postagger.unigram.UnigramTagger
    - - edu.northwestern.at.morphadorner.corpuslinguistics.postagger.regexp.RegexpTagger

All Implemented Interfaces:

UsesLexicon, CanTagOneWord, PartOfSpeechTagger, IsCloseable, UsesLogger
```
public class RegexpTagger
extends UnigramTagger
implements PartOfSpeechTagger, CanTagOneWord
```
Regular Expression Part of Speech tagger.
The regular expression part of speech tagger uses a regular expressions to assign a part of speech tag to a spelling.

Field Summary

Fields
Modifier and Type	Field and Description
`protected java.util.regex.Matcher[]`	`regexpMatchers`
`protected java.util.regex.Pattern[]`	`regexpPatterns` Parts of speech for each lexical rule.
`protected java.lang.String[]`	`regexpTags`

Fields inherited from class edu.northwestern.at.morphadorner.corpuslinguistics.postagger.AbstractPartOfSpeechTagger
contextRules, contextualSmoother, dynamicLexicon, lexicalRules, lexicalSmoother, lexicon, logger, partOfSpeechGuesser, postTokenizer, retagger, ruleCorrections, transitionMatrix

Constructor Summary

Constructors
Constructor and Description

RegexpTagger()
Create a suffix tagger.

Constructors
Constructor and Description
`RegexpTagger()` Create a suffix tagger.

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`setLexicalRules(java.lang.String[] lexicalRules)` Set lexical rules for tagging.
`java.lang.String`	`tagWord(java.lang.String word)` Tag a single word.
`java.lang.String`	`toString()` Return tagger description.
`boolean`	`usesLexicalRules()` See if tagger uses lexical rules.

Methods inherited from class edu.northwestern.at.morphadorner.corpuslinguistics.postagger.unigram.UnigramTagger
tagAdornedWordList, tagWord

Methods inherited from class edu.northwestern.at.morphadorner.corpuslinguistics.postagger.AbstractPartOfSpeechTagger
clearRuleCorrections, createPartOfSpeechGuesser, getContextualSmoother, getDynamicLexicon, getLexicalSmoother, getLexicon, getLexicon, getLogger, getMostCommonTag, getPartOfSpeechGuesser, getPostTokenizer, getRetagger, getRuleCorrections, getTagCount, getTagsForWord, getTransitionMatrix, incrementRuleCorrections, retagWords, setContextRules, setContextualSmoother, setLexicalSmoother, setLexicon, setLogger, setPartOfSpeechGuesser, setPostTokenizer, setRetagger, setTransitionMatrix, tagAdornedWordSentence, tagAdornedWordSentences, tagSentence, tagSentences, usesContextRules, usesTransitionProbabilities

Methods inherited from class edu.northwestern.at.utils.IsCloseableObject
close

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface edu.northwestern.at.morphadorner.corpuslinguistics.postagger.PartOfSpeechTagger
clearRuleCorrections, getContextualSmoother, getLexicalSmoother, getLexicon, getLexicon, getPartOfSpeechGuesser, getPostTokenizer, getRetagger, getRuleCorrections, getTagCount, getTagsForWord, getTransitionMatrix, incrementRuleCorrections, retagWords, setContextRules, setContextualSmoother, setLexicalSmoother, setLexicon, setPartOfSpeechGuesser, setPostTokenizer, setRetagger, setTransitionMatrix, tagAdornedWordList, tagAdornedWordSentence, tagAdornedWordSentences, tagSentence, tagSentences, usesContextRules, usesTransitionProbabilities

Methods inherited from interface edu.northwestern.at.morphadorner.corpuslinguistics.postagger.CanTagOneWord
tagWord

Methods inherited from interface edu.northwestern.at.utils.IsCloseable
close

- Field Detail
  - regexpPatterns
```
protected java.util.regex.Pattern[] regexpPatterns
```
    Parts of speech for each lexical rule.
  - regexpMatchers
```
protected java.util.regex.Matcher[] regexpMatchers
```
  - regexpTags
```
protected java.lang.String[] regexpTags
```
- Constructor Detail
  - RegexpTagger
```
public RegexpTagger()
```
    Create a suffix tagger.
- Method Detail
  - usesLexicalRules
```
public boolean usesLexicalRules()
```
    See if tagger uses lexical rules.
    
    Specified by:
    
    usesLexicalRules in interface PartOfSpeechTagger
    
    Overrides:
    
    usesLexicalRules in class AbstractPartOfSpeechTagger
    
    Returns:
    True since this tagger uses regular expression based lexical rules.
  - setLexicalRules
```
public void setLexicalRules(java.lang.String[] lexicalRules)
                     throws InvalidRuleException
```
    Set lexical rules for tagging.
    
    Specified by:
    
    setLexicalRules in interface PartOfSpeechTagger
    
    Overrides:
    
    setLexicalRules in class AbstractPartOfSpeechTagger
    
    Parameters:
    lexicalRules - String array of lexical rules.
    
    Throws:
    
    InvalidRuleException - if a rule is bad.
    For the regular expression tagger, each rule takes the form:
    
    regular-expression \t part-of-speech-tag
    
    where "regular expression" is the regular expression and "part-of-speech-tag" is the part of speech tag to assign to a spelling matched by the regular expression. An ascii tab character (\t) separates the pattern from the tag.
  - tagWord
```
public java.lang.String tagWord(java.lang.String word)
```
    Tag a single word.
    
    Specified by:
    
    tagWord in interface CanTagOneWord
    
    Overrides:
    
    tagWord in class UnigramTagger
    
    Parameters:
    word - The word.
    
    Returns:
    The part of speech for the word.
    Applies each of the regular expressions stored in the lexical rules lexicon and returns the tag of associated with the first matching regular expression.
  - toString
```
public java.lang.String toString()
```
    Return tagger description.
    
    Overrides:
    
    toString in class UnigramTagger
    
    Returns:
    Tagger description.

Class RegexpTagger

Field Summary

Fields inherited from class edu.northwestern.at.morphadorner.corpuslinguistics.postagger.AbstractPartOfSpeechTagger

Constructor Summary

Method Summary

Methods inherited from class edu.northwestern.at.morphadorner.corpuslinguistics.postagger.unigram.UnigramTagger

Methods inherited from class edu.northwestern.at.morphadorner.corpuslinguistics.postagger.AbstractPartOfSpeechTagger

Methods inherited from class edu.northwestern.at.utils.IsCloseableObject

Methods inherited from class java.lang.Object

Methods inherited from interface edu.northwestern.at.morphadorner.corpuslinguistics.postagger.PartOfSpeechTagger

Methods inherited from interface edu.northwestern.at.morphadorner.corpuslinguistics.postagger.CanTagOneWord

Methods inherited from interface edu.northwestern.at.utils.IsCloseable

Field Detail

regexpPatterns

regexpMatchers

regexpTags

Constructor Detail

RegexpTagger

Method Detail

usesLexicalRules

setLexicalRules

tagWord

toString