HeppleTagger (MorphAdorner)

java.lang.Object
- edu.northwestern.at.utils.IsCloseableObject
- - edu.northwestern.at.morphadorner.corpuslinguistics.postagger.AbstractPartOfSpeechTagger
  - - edu.northwestern.at.morphadorner.corpuslinguistics.postagger.hepple.HeppleTagger

All Implemented Interfaces:

UsesLexicon, PartOfSpeechRetagger, PartOfSpeechTagger, IsCloseable, UsesLogger
```
public class HeppleTagger
extends AbstractPartOfSpeechTagger
implements PartOfSpeechTagger, PartOfSpeechRetagger
```
HeppleTagger: Mark Hepple's Part of Speech Tagger.
Copyright (c) 2001-2005, The University of Sheffield.

This file is part of GATE (see http://gate.ac.uk/), and is free software, licenced under the GNU Library General Public License, Version 2, June 1991 (in the distribution as file licence.html, and also available at http://gate.ac.uk/gate/licence.html).

HeppleTagger was originally written by Mark Hepple. The GATE version contains modifications by Valentin Tablan and Niraj Aswani.

This version also contains many modifications made at Northwestern University for use in the WordHoard project.

Comments:

Implements a version of the decision list based tagging method described in:

M. Hepple. 2000. Independence and Commitment: Assumptions for Rapid Training and Execution of Rule-based Part-of-Speech Taggers. Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000). Hong Kong, October 2000.

Modified by Philip R. Burns at Northwestern University to remove dependencies upon the Penn Treebank tag set, to allow plugable handling of unknown words, to remove all input/output for tagged text and rules to calling classes, and to allow the Hepple tagger to be used as a retagger.

Field Summary

Fields
Modifier and Type	Field and Description
`protected boolean`	`debug` Debug flag.
`java.lang.String[][]`	`lexBuff` Sliding parts of speech buffer.
`protected java.util.Map<java.lang.String,java.util.List<Rule>>`	`rules` Tagging rules.
`protected static java.lang.String`	`staart` Marks unused positions in sliding word buffer.
`protected static java.lang.String[]`	`staartLex`
`protected static AdornedWord`	`staartWordAndTag`
`java.lang.String[]`	`tagBuff` Sliding tag buffer.
`java.lang.String[]`	`wordBuff` Sliding word buffer.

Fields inherited from class edu.northwestern.at.morphadorner.corpuslinguistics.postagger.AbstractPartOfSpeechTagger
contextRules, contextualSmoother, dynamicLexicon, lexicalRules, lexicalSmoother, lexicon, logger, partOfSpeechGuesser, postTokenizer, retagger, ruleCorrections, transitionMatrix

Constructor Summary

Constructors
Constructor and Description

HeppleTagger()
Construct a Hepple POS tagger.

Constructors
Constructor and Description
`HeppleTagger()` Construct a Hepple POS tagger.

Method Summary

Methods
Modifier and Type	Method and Description
`protected Rule`	`createNewRule(java.lang.String ruleId)` Creates a new rule of the required type according to the provided ID.
`boolean`	`getCanAddOrDeleteWords()` Can retagger add or delete words in the original sentence?
`protected java.lang.String[]`	`getPartsOfSpeech(java.lang.String word, boolean isFirstWord)` Get parts of speech for a word.
`protected <T extends AdornedWord> boolean`	`oneRetagStep(T adornedWord, boolean isFirstWord, java.util.List<T> taggedSentence)` Adds a new word to the current retagging window.
`protected boolean`	`oneStep(AdornedWord word, boolean isFirstWord, java.util.List taggedSentence)` Adds a new word to the current tagging window.
`<T extends AdornedWord> java.util.List<T>`	`retagSentence(java.util.List<T> sentence)` Retag one sentence.
`void`	`setCanAddOrDeleteWords(boolean canAddOrDeleteWords)` Can retagger add or delete words in the original sentence?
`void`	`setContextRules(java.lang.String[] contextRules)` Set context rules for tagging.
`<T extends AdornedWord> java.util.List<T>`	`tagAdornedWordList(java.util.List<T> sentence)` Tag an adorned word list.
`java.lang.String`	`toString()` Return tagger description.
`boolean`	`usesContextRules()` See if tagger uses context rules.

Methods inherited from class edu.northwestern.at.morphadorner.corpuslinguistics.postagger.AbstractPartOfSpeechTagger
clearRuleCorrections, createPartOfSpeechGuesser, getContextualSmoother, getDynamicLexicon, getLexicalSmoother, getLexicon, getLexicon, getLogger, getMostCommonTag, getPartOfSpeechGuesser, getPostTokenizer, getRetagger, getRuleCorrections, getTagCount, getTagsForWord, getTransitionMatrix, incrementRuleCorrections, retagWords, setContextualSmoother, setLexicalRules, setLexicalSmoother, setLexicon, setLogger, setPartOfSpeechGuesser, setPostTokenizer, setRetagger, setTransitionMatrix, tagAdornedWordSentence, tagAdornedWordSentences, tagSentence, tagSentences, usesLexicalRules, usesTransitionProbabilities

Methods inherited from class edu.northwestern.at.utils.IsCloseableObject
close

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Methods inherited from interface edu.northwestern.at.morphadorner.corpuslinguistics.postagger.PartOfSpeechTagger
clearRuleCorrections, getContextualSmoother, getLexicalSmoother, getLexicon, getLexicon, getPartOfSpeechGuesser, getPostTokenizer, getRetagger, getRuleCorrections, getTagCount, getTagsForWord, getTransitionMatrix, incrementRuleCorrections, retagWords, setContextualSmoother, setLexicalRules, setLexicalSmoother, setLexicon, setPartOfSpeechGuesser, setPostTokenizer, setRetagger, setTransitionMatrix, tagAdornedWordSentence, tagAdornedWordSentences, tagSentence, tagSentences, usesLexicalRules, usesTransitionProbabilities

Methods inherited from interface edu.northwestern.at.utils.IsCloseable
close

- Field Detail
  - rules
```
protected java.util.Map<java.lang.String,java.util.List<Rule>> rules
```
    Tagging rules.
    The tagging rules are stored in a map. The map keys are parts of speech. The value for each part of speech key is a lists of rules which apply to that part of speech.
    
    Tagging rules are specified using the syntax proposed by Eric Brill in his dissertation. Rules take the general form:
    
    fromtag totag condition param1 param2
    
    where "fromtag" is the current tag for a word, "totag" is the new tag to replace the current tag if the "condition" is met, and "param1" and "param2" are optional values for the condition test. Each rule must specify at least the fromtag. totag, and condition. The fromtag values are the keys for the rules map.
  - staart
```
protected static final java.lang.String staart
```
    Marks unused positions in sliding word buffer.
    
    See Also:
    Constant Field Values
  - staartLex
```
protected static final java.lang.String[] staartLex
```
  - staartWordAndTag
```
protected static final AdornedWord staartWordAndTag
```
  - wordBuff
```
public java.lang.String[] wordBuff
```
    Sliding word buffer.
  - tagBuff
```
public java.lang.String[] tagBuff
```
    Sliding tag buffer.
  - lexBuff
```
public java.lang.String[][] lexBuff
```
    Sliding parts of speech buffer.
  - debug
```
protected boolean debug
```
    Debug flag.
- Constructor Detail
  - HeppleTagger
```
public HeppleTagger()
```
    Construct a Hepple POS tagger.
- Method Detail
  - usesContextRules
```
public boolean usesContextRules()
```
    See if tagger uses context rules.
    
    Specified by:
    
    usesContextRules in interface PartOfSpeechTagger
    
    Overrides:
    
    usesContextRules in class AbstractPartOfSpeechTagger
    
    Returns:
    True since Hepple tagger uses context rules.
  - setContextRules
```
public void setContextRules(java.lang.String[] contextRules)
                     throws InvalidRuleException
```
    Set context rules for tagging.
    
    Specified by:
    
    setContextRules in interface PartOfSpeechTagger
    
    Overrides:
    
    setContextRules in class AbstractPartOfSpeechTagger
    
    Parameters:
    contextRules - String array of context rules.
    
    Throws:
    
    InvalidRuleException - if a rule is bad.
  - createNewRule
```
protected Rule createNewRule(java.lang.String ruleId)
                      throws InvalidRuleException
```
    Creates a new rule of the required type according to the provided ID.
    
    Parameters:
    ruleId - The ID for the rule to be created
    
    Throws:
    
    InvalidRuleException
  - tagAdornedWordList
```
public <T extends AdornedWord> java.util.List<T> tagAdornedWordList(java.util.List<T> sentence)
```
    Tag an adorned word list.
    
    Specified by:
    
    tagAdornedWordList in interface PartOfSpeechTagger
    
    Specified by:
    
    tagAdornedWordList in class AbstractPartOfSpeechTagger
    
    Parameters:
    sentence - The sentence as an AdornedWord.
    
    Returns:
    An AdornedWord of the words in the sentence tagged with parts of speech.
    The input sentence is a AdornedWord of words to be tagged. The output is the same list of words with parts of speech added.
  - oneStep
```
protected boolean oneStep(AdornedWord word,
              boolean isFirstWord,
              java.util.List taggedSentence)
```
    Adds a new word to the current tagging window.
    
    Parameters:
    word - The new word to add.
    isFirstWord - True if word is first in sentence.
    taggedSentence - A List of adorned words representing the results of tagging the current sentence so far.
    
    Returns:
    true if a full sentence is now tagged, false otherwise.
    Adds a new word to the current window of 7 words (on the last position) and tags the word currently in the middle (i.e. on position 3). This function also reads the word on the first position and adds its tag to the taggedSentence structure as this word would be lost at the next advance. If this word completes a sentence then it returns true otherwise it returns false.
  - retagSentence
```
public <T extends AdornedWord> java.util.List<T> retagSentence(java.util.List<T> sentence)
```
    Retag one sentence.
    
    Specified by:
    
    retagSentence in interface PartOfSpeechRetagger
    
    Parameters:
    sentence - List of adorned words to retag.
    
    Returns:
    List of retagged words.
  - oneRetagStep
```
protected <T extends AdornedWord> boolean oneRetagStep(T adornedWord,
                                           boolean isFirstWord,
                                           java.util.List<T> taggedSentence)
```
    Adds a new word to the current retagging window.
    
    Parameters:
    adornedWord - The new word and its tag.
    isFirstWord - True if word is first in sentence.
    taggedSentence - A List of adorned words representing the results of tagging the current sentence so far.
    
    Returns:
    true if a full sentence is now tagged, false otherwise.
    Adds a new word to the current window of 7 words (on the last position) and tags the word currently in the middle (i.e. on position 3). This function also reads the word on the first position and adds its tag to the taggedSentence structure as this word would be lost at the next advance. If this word completes a sentence then it returns true otherwise it returns false.
  - getPartsOfSpeech
```
protected java.lang.String[] getPartsOfSpeech(java.lang.String word,
                                  boolean isFirstWord)
```
    Get parts of speech for a word.
    
    Parameters:
    word - The word to be classified.
    isFirstWord - True if word is first word in sentence.
    
    Returns:
    String array of potential parts of speech.
    The lexicon must always return one or more parts of speech. In addition, for this tagger, the most frequently occurring tag must be the first one in the returned string array.
  - getCanAddOrDeleteWords
```
public boolean getCanAddOrDeleteWords()
```
    Can retagger add or delete words in the original sentence?
    
    Specified by:
    
    getCanAddOrDeleteWords in interface PartOfSpeechRetagger
    
    Returns:
    true if retagger can add or delete words.
  - setCanAddOrDeleteWords
```
public void setCanAddOrDeleteWords(boolean canAddOrDeleteWords)
```
    Can retagger add or delete words in the original sentence?
    
    Specified by:
    
    setCanAddOrDeleteWords in interface PartOfSpeechRetagger
    
    Parameters:
    canAddOrDeleteWords - true if retagger can add or delete words.
    Ignored here.
  - toString
```
public java.lang.String toString()
```
    Return tagger description.
    
    Overrides:
    
    toString in class java.lang.Object
    
    Returns:
    Tagger description.

Class HeppleTagger

Field Summary

Fields inherited from class edu.northwestern.at.morphadorner.corpuslinguistics.postagger.AbstractPartOfSpeechTagger

Constructor Summary

Method Summary

Methods inherited from class edu.northwestern.at.morphadorner.corpuslinguistics.postagger.AbstractPartOfSpeechTagger

Methods inherited from class edu.northwestern.at.utils.IsCloseableObject

Methods inherited from class java.lang.Object

Methods inherited from interface edu.northwestern.at.morphadorner.corpuslinguistics.postagger.PartOfSpeechTagger

Methods inherited from interface edu.northwestern.at.utils.IsCloseable

Field Detail

rules

staart

staartLex

staartWordAndTag

wordBuff

tagBuff

lexBuff

debug

Constructor Detail

HeppleTagger

Method Detail

usesContextRules

setContextRules

createNewRule

tagAdornedWordList

oneStep

retagSentence

oneRetagStep

getPartsOfSpeech

getCanAddOrDeleteWords

setCanAddOrDeleteWords

toString