public class TrainingWord
extends java.lang.Object
The training data for a rule-based part of speech tagger consists of a tagged corpus in which each line contains the following three columns separated by an Ascii tab character.
A list of such training data entries or "sites" can be used by a transformation based learning program to generate rules which attempt to correct erroneous guessed part of tags. As rules are generated which "cover" specific training sites, these locations may need to marked as unavailable for use by other rules. We use a flag to mark training sites already covered by a correction rule, or to which no rule can apply -- for example, for words which are punctuation marks or which have only one possible part of speech tag.
Modifier and Type | Field and Description |
---|---|
java.lang.String |
correctTag
Correct tag.
|
boolean |
covered
True if word covered by a rule or not subject to change.
|
java.lang.String |
guessedTag
Guessed tag.
|
java.lang.String |
spelling
Spelling.
|
Constructor and Description |
---|
TrainingWord(java.lang.String spelling,
java.lang.String correctTag,
java.lang.String guessedTag,
boolean covered)
Create a training word entry.
|
public final java.lang.String spelling
public final java.lang.String correctTag
public java.lang.String guessedTag
public boolean covered
public TrainingWord(java.lang.String spelling, java.lang.String correctTag, java.lang.String guessedTag, boolean covered)
spelling
- The spelling.correctTag
- The correct tag.guessedTag
- The guessed tag.covered
- True if word tag not subject to change.