edu.northwestern.at.utils.corpuslinguistics.postagger
Interface PartOfSpeechTagger

All Known Subinterfaces:
PartOfSpeechRetagger
All Known Implementing Classes:
AbstractPartOfSpeechTagger, AffixTagger, BigramHybridTagger, BigramTagger, DefaultPartOfSpeechRetagger, DefaultPartOfSpeechTagger, HeppleTagger, IRetagger, NoopRetagger, ProperNounRetagger, RegexpTagger, SimpleRuleBasedTagger, SimpleTagger, SuffixTagger, TrigramHybridTagger, TrigramTagger, UnigramTagger

public interface PartOfSpeechTagger

Interface for a Part of Speech tagger.


Method Summary
 void clearRuleCorrections()
          Clear count of successful rule applications.
 Lexicon getLexicon()
          Get the lexicon.
 Lexicon getLexicon(java.lang.String word)
          Get the lexicon for a specific word.
 PartOfSpeechGuesser getPartOfSpeechGuesser()
          Get part of speech guesser.
 PartOfSpeechRetagger getRetagger()
          Get part of speech retagger.
 int getRuleCorrections()
          Get count of successful rule applications.
 int getTagCount(java.lang.String word, java.lang.String tag)
          Get count of times a word appears with a given tag.
 java.util.List<java.lang.String> getTagsForWord(java.lang.String word)
          Get potential part of speech tags for a word.
 TransitionMatrix getTransitionMatrix()
          Get tag transition probabilities matrix.
 void incrementRuleCorrections()
          Increment count of successful rule applications.
<T extends AdornedWord>
java.util.List<T>
retagWords(java.util.List<T> taggedSentence)
          Retag words in a tagged sentence.
 void setContextRules(java.lang.String[] contextRules)
          Set context rules for tagging.
 void setLexicalRules(java.lang.String[] lexicalRules)
          Set lexical rules for tagging.
 void setLexicon(Lexicon lexicon)
          Set the lexicon.
 void setPartOfSpeechGuesser(PartOfSpeechGuesser guesser)
          Set part of speech guesser.
 void setRetagger(PartOfSpeechRetagger retagger)
          Set part of speech retagger.
 void setTransitionMatrix(TransitionMatrix transitionMatrix)
          Set tag transition probabilities matrix.
<T extends AdornedWord>
java.util.List<T>
tagAdornedWordList(java.util.List<T> sentence)
          Tag a list of adorned words.
<T extends AdornedWord>
java.util.List<T>
tagAdornedWordSentence(java.util.List<T> sentence)
          Tag a sentence.
<T extends AdornedWord>
java.util.List<java.util.List<T>>
tagAdornedWordSentences(java.util.List<java.util.List<T>> sentences)
          Tag a list of sentences.
 java.util.List<AdornedWord> tagSentence(java.util.List<java.lang.String> sentence)
          Tag a sentence.
 java.util.List<java.util.List<AdornedWord>> tagSentences(java.util.List<java.util.List<java.lang.String>> sentences)
          Tag a list of sentences.
 boolean usesContextRules()
          See if tagger uses context rules.
 boolean usesLexicalRules()
          See if tagger uses lexical rules.
 boolean usesTransitionProbabilities()
          See if tagger uses a probability transition matrix.
 

Method Detail

usesContextRules

boolean usesContextRules()
See if tagger uses context rules.

Returns:
True if tagger uses context rules.

usesLexicalRules

boolean usesLexicalRules()
See if tagger uses lexical rules.

Returns:
True if tagger uses lexical rules.

usesTransitionProbabilities

boolean usesTransitionProbabilities()
See if tagger uses a probability transition matrix.

Returns:
True if tagger uses probability transition matrix.

setContextRules

void setContextRules(java.lang.String[] contextRules)
                     throws InvalidRuleException
Set context rules for tagging.

Parameters:
contextRules - String array of context rules.
Throws:
InvalidRuleException - if a rule is bad.

For taggers which do not use context rules, this is a no-op.


setLexicalRules

void setLexicalRules(java.lang.String[] lexicalRules)
                     throws InvalidRuleException
Set lexical rules for tagging.

Parameters:
lexicalRules - String array of lexical rules.
Throws:
InvalidRuleException - if a rule is bad.

For taggers which do not use lexical rules, this is a no-op.


getLexicon

Lexicon getLexicon()
Get the lexicon.

Returns:
The lexicon. May be null if tagger does not a lexicon.

getLexicon

Lexicon getLexicon(java.lang.String word)
Get the lexicon for a specific word.

Parameters:
word - The word whose associated lexicon we want.
Returns:
The lexicon. May be null if tagger does not a lexicon.

setLexicon

void setLexicon(Lexicon lexicon)
Set the lexicon.

Parameters:
lexicon - Lexicon used for tagging.

getTransitionMatrix

TransitionMatrix getTransitionMatrix()
Get tag transition probabilities matrix.

Returns:
Tag probabilities transition matrix. May be null for taggers which do not use a transition matrix.

setTransitionMatrix

void setTransitionMatrix(TransitionMatrix transitionMatrix)
Set tag transition probabilities matrix.

Parameters:
transitionMatrix - Tag probabilities transition matrix.

For taggers which do not use transition matrices, this is a no-op.


getPartOfSpeechGuesser

PartOfSpeechGuesser getPartOfSpeechGuesser()
Get part of speech guesser.

Returns:
The part of speech guesser.

setPartOfSpeechGuesser

void setPartOfSpeechGuesser(PartOfSpeechGuesser guesser)
Set part of speech guesser.

Parameters:
guesser - The part of speech guesser.

getRetagger

PartOfSpeechRetagger getRetagger()
Get part of speech retagger.

Returns:
The part of speech retagger. May be null.

setRetagger

void setRetagger(PartOfSpeechRetagger retagger)
Set part of speech retagger.

Parameters:
retagger - The part of speech retagger.

getTagsForWord

java.util.List<java.lang.String> getTagsForWord(java.lang.String word)
Get potential part of speech tags for a word.

Parameters:
word - The word whose part of speech tags we want.
Returns:
The list of part of speech tags. May be empty.

getTagCount

int getTagCount(java.lang.String word,
                java.lang.String tag)
Get count of times a word appears with a given tag.

Parameters:
word - The word.
tag - The part of speech tag.
Returns:
The number of times the word appears with the given tag.

clearRuleCorrections

void clearRuleCorrections()
Clear count of successful rule applications.


incrementRuleCorrections

void incrementRuleCorrections()
Increment count of successful rule applications.


getRuleCorrections

int getRuleCorrections()
Get count of successful rule applications.


tagSentences

java.util.List<java.util.List<AdornedWord>> tagSentences(java.util.List<java.util.List<java.lang.String>> sentences)
Tag a list of sentences.

Parameters:
sentences - The list of sentences.
Returns:
The sentences with words adorned with parts of speech.

The sentences are a List of Lists of words to be tagged. Each sentence is represented as a list of words. The output is a list of AdornedWords.


tagSentence

java.util.List<AdornedWord> tagSentence(java.util.List<java.lang.String> sentence)
Tag a sentence.

Parameters:
sentence - The sentence as a List of string tokens.
Returns:
The tagged sentence as an AdornedWord.

tagAdornedWordSentence

<T extends AdornedWord> java.util.List<T> tagAdornedWordSentence(java.util.List<T> sentence)
Tag a sentence.

Parameters:
sentence - The sentence as a list of string words.
Returns:
An AdornedWord of the words in the sentence tagged with parts of speech.

The input sentence is a List of adorned words to be tagged. The output is the same list with parts of speech added/modified.


tagAdornedWordSentences

<T extends AdornedWord> java.util.List<java.util.List<T>> tagAdornedWordSentences(java.util.List<java.util.List<T>> sentences)
Tag a list of sentences.

Parameters:
sentences - The list of sentences.
Returns:
The sentences with words adorned with parts of speech.

The sentences are a List of Lists of adorned words to be tagged. Each sentence is represented as a list of words. The output is a list of AdornedWords.


tagAdornedWordList

<T extends AdornedWord> java.util.List<T> tagAdornedWordList(java.util.List<T> sentence)
Tag a list of adorned words.

Parameters:
sentence - The sentence as an AdornedWord.
Returns:
The tagged sentence (same as input with parts of speech added).

retagWords

<T extends AdornedWord> java.util.List<T> retagWords(java.util.List<T> taggedSentence)
Retag words in a tagged sentence.

Parameters:
taggedSentence - The tagged sentence as an AdornedWord.
Returns:
The retagged sentence.