public interface PartOfSpeechTags
Each entry in the part of speech properties file takes the following form:
tagn.name=postag tagn.generalwordclass=general word class tagn.lemmawordclass=lemma word class tagn.majorwordclass=major word class tagn.wordclass=word class tagn.description=extended description
where name is the part of speech tag, wordclass is the word class for the part of speech, majorwordclass is the major word class for the part of speech, lemmawordclass is the word class for lemmatization purposes, and generalwordclass is the associated general word class, if any (see below). The postag, wordclass, majorWordclass, and lemmawordclass fields must be provided. The generalwordclass and description are optional.
The tag properties file must be encoded using the utf-8 character set.
Example: the singular proper noun definition for the NUPOS tag set.
tag104.name=np1
tag104.generalwordclass=noun-proper-singular
tag104.lemmawordclass=none
tag104.majorwordclass=noun
tag104.wordclass=proper noun
The following general word class names allow references to commonly used tags in a tag set independent fashion. A tag set need not define all of these, but it should define as many as possible.
General tag name | Meaning |
---|---|
adjective | Tag for an adjective |
adverb | Tag for an adverb |
foreign-word | Tag for a foreign word. |
foreign-french | Tag for a French word. |
foreign-german | Tag for a German word. |
foreign-greek | Tag for a Greek word. |
foreign-hebrew | Tag for a Hebrew word. |
foreign-italian | Tag for an Italian word. |
foreign-spanish | Tag for a Spanish word. |
foreign-english | Tag for an English word. |
foreign-latin | Tag for a Latin word. |
interjection | Tag for an interjection. |
noun-singular | Tag for a singular noun. |
noun-singular-possessive | Tag for a singular possessive noun. |
noun-plural | Tag for a plural noun. |
noun-plural-possessive | Tag for a plural possessive noun. |
noun-proper-singular | Tag for a singular proper noun. |
noun-proper-singular-possessive | Tag for a singular proper possessive noun. |
noun-proper-plural | Tag for a plural proper noun. |
noun-proper-plural-possessive | Tag for a plural proper possessive noun. |
numeral-cardinal | Tag for a cardinal number. |
numeral-ordinal | Tag for an ordinal number. |
symbol | Tag for a symbol. |
undetermined | Tag for an undetermined part of speech. |
verb | Tag for a verb. |
verb-past | Tag for a past verb. |
verb-past-participle | Tag for a past participle verb. |
verb-present-participle | Tag for a present participle verb. |
Modifier and Type | Field and Description |
---|---|
static java.lang.String |
ADJECTIVE |
static java.lang.String |
ADVERB |
static java.lang.String |
CARDINAL_NUMERAL |
static int |
DESCRIPTION_INDEX |
static java.lang.String |
ENGLISH_WORD |
static java.lang.String |
FOREIGN_WORD
General tag names.
|
static java.lang.String |
FRENCH_WORD |
static int |
GENERAL_TAG_NAME_INDEX |
static java.lang.String |
GERMAN_WORD |
static java.lang.String |
GREEK_WORD |
static java.lang.String |
HEBREW_WORD |
static java.lang.String |
INTERJECTION |
static java.lang.String |
ITALIAN_WORD |
static java.lang.String |
LATIN_WORD |
static int |
MAJOR_WORDCLASS_INDEX |
static java.lang.String |
NONE |
static java.lang.String |
ORDINAL_NUMERAL |
static java.lang.String |
PLURAL_NOUN |
static java.lang.String |
PLURAL_PROPER_NOUN |
static java.lang.String |
POSSESSIVE_NOUN |
static java.lang.String |
POSSESSIVE_PLURAL_NOUN |
static java.lang.String |
POSSESSIVE_PLURAL_PROPER_NOUN |
static java.lang.String |
POSSESSIVE_SINGULAR_NOUN |
static java.lang.String |
POSSESSIVE_SINGULAR_PROPER_NOUN |
static java.lang.String |
PROPER_NOUN |
static java.lang.String |
PUNCTUATION |
static java.lang.String |
SINGULAR_NOUN |
static java.lang.String |
SINGULAR_PROPER_NOUN |
static java.lang.String |
SPANISH_WORD |
static java.lang.String |
SYMBOL |
static int |
TAG_INDEX
Indices for part of speech data values.
|
static java.lang.String |
UNDETERMINED |
static java.lang.String |
VERB |
static java.lang.String |
VERB_PAST |
static java.lang.String |
VERB_PAST_PARTICIPLE |
static java.lang.String |
VERB_PRESENT_PARTICIPLE |
static int |
WORDCLASS_INDEX |
Modifier and Type | Method and Description |
---|---|
void |
addPartOfSpeech(PartOfSpeech partOfSpeech)
Add a part of speech.
|
void |
addTag(java.lang.String tag,
java.lang.String wordClass,
java.lang.String majorWordClass,
java.lang.String lemmaWordClass,
java.lang.String generalTagName,
java.lang.String description)
Add a part of speech tag.
|
int |
countTags(java.lang.String tag)
Get number of tags comprising this tag.
|
java.lang.String |
getAdjectiveTag()
Get the part of speech tag for an adjective.
|
java.lang.String |
getAdverbTag()
Get the part of speech tag for an adverb.
|
java.lang.String |
getCardinalNumberTag()
Get the part of speech tag for a cardinal number.
|
java.lang.String |
getCorrespondingCommonNounTag(java.lang.String tag)
Convert proper noun tag to common noun tag.
|
java.lang.String |
getDescription(java.lang.String tag)
Get the description for the part of speech.
|
java.lang.String |
getForeignWordTag(java.lang.String language)
Get the part of speech tag for a specified foreign language
|
java.lang.String |
getInterjectionTag()
Get the part of speech tag for an interjection.
|
java.lang.String |
getLemmaWordClass(java.lang.String tag)
Get lemma class for a tag.
|
java.lang.String |
getMajorWordClass(java.lang.String tag)
Get major word class for a tag.
|
java.lang.String |
getOrdinalNumberTag()
Get the part of speech tag for an ordinal number.
|
java.lang.String |
getPastParticipleTag()
Get the part of speech tag for a verbal past participle
|
java.lang.String |
getPluralNounTag()
Get the part of speech tag for a plural noun.
|
java.lang.String |
getPluralProperNounTag()
Get the part of speech tag for a plural proper noun.
|
java.lang.String |
getPossessivePluralNounTag()
Get the part of speech tag for a possessive plural noun.
|
java.lang.String |
getPossessivePluralProperNounTag()
Get the part of speech tag for a possessive plural proper noun.
|
java.lang.String |
getPossessiveSingularNounTag()
Get the part of speech tag for a possessive singular noun.
|
java.lang.String |
getPossessiveSingularProperNounTag()
Get the part of speech tag for a possessive singular proper noun.
|
java.lang.String |
getPresentParticipleTag()
Get the part of speech tag for a verbal present participle
|
java.lang.String |
getSingularNounTag()
Get the part of speech tag for a singular noun.
|
java.lang.String |
getSingularProperNounTag()
Get the part of speech tag for a singular proper noun.
|
java.lang.String |
getSymbolTag()
Get the part of speech tag for a symbol.
|
PartOfSpeech |
getTag(java.lang.String tag)
Get data for a tag.
|
java.util.List<PartOfSpeech> |
getTags()
Get list of tag entries in PartOfSpeech format.
|
java.lang.String |
getTagSeparator()
Get part of speech separator.
|
java.lang.String |
getUndeterminedTag()
Get undetermined part of speech tag.
|
java.lang.String |
getVerbPastTag()
Get the part of speech tag for a verb past tense.
|
java.lang.String |
getVerbTag()
Get the part of speech tag for a verb.
|
java.lang.String |
getWordClass(java.lang.String tag)
Get word class for a tag.
|
boolean |
isCompoundTag(java.lang.String tag)
Check if specified tag contains more than one part of speech.
|
boolean |
isDeterminerTag(java.lang.String tag)
Is tag for a determiner.
|
boolean |
isForeignWordTag(java.lang.String tag)
Is tag for a foreign word.
|
boolean |
isInterjectionTag(java.lang.String tag)
Check if specified tag is an interjection.
|
boolean |
isNounTag(java.lang.String tag)
Is tag for a noun.
|
boolean |
isNumberTag(java.lang.String tag)
Is tag for a number.
|
boolean |
isPersonalPronounTag(java.lang.String tag)
Is tag for a personal pronoun.
|
boolean |
isPronounTag(java.lang.String tag)
Is tag for a pronoun.
|
boolean |
isProperAdjectiveTag(java.lang.String tag)
Is tag for a proper adjective.
|
boolean |
isProperNounTag(java.lang.String tag)
Is tag for a proper noun.
|
boolean |
isPunctuationTag(java.lang.String tag)
Is tag for punctuation.
|
boolean |
isSingularNounTag(java.lang.String tag)
Is tag for a singular noun.
|
boolean |
isSymbolTag(java.lang.String tag)
Is tag for a symbol.
|
boolean |
isTag(java.lang.String tag)
Check if specified tag appears in the tag list.
|
boolean |
isUndeterminedTag(java.lang.String tag)
Is part of speech tag undetermined.
|
boolean |
isVerbTag(java.lang.String tag)
Is tag for a verb.
|
java.lang.String |
joinTags(java.lang.String[] tags)
Join separate tags into a compound tag.
|
java.lang.String |
joinTags(java.lang.String[] tags,
java.lang.String separator)
Join separate tags into a compound tag.
|
java.lang.String[] |
splitTag(java.lang.String tag)
Split compound tag into separate tags.
|
static final java.lang.String FOREIGN_WORD
static final java.lang.String ENGLISH_WORD
static final java.lang.String FRENCH_WORD
static final java.lang.String GERMAN_WORD
static final java.lang.String GREEK_WORD
static final java.lang.String ITALIAN_WORD
static final java.lang.String HEBREW_WORD
static final java.lang.String LATIN_WORD
static final java.lang.String SPANISH_WORD
static final java.lang.String SINGULAR_NOUN
static final java.lang.String PLURAL_NOUN
static final java.lang.String POSSESSIVE_NOUN
static final java.lang.String POSSESSIVE_SINGULAR_NOUN
static final java.lang.String POSSESSIVE_PLURAL_NOUN
static final java.lang.String SINGULAR_PROPER_NOUN
static final java.lang.String PROPER_NOUN
static final java.lang.String PLURAL_PROPER_NOUN
static final java.lang.String POSSESSIVE_SINGULAR_PROPER_NOUN
static final java.lang.String POSSESSIVE_PLURAL_PROPER_NOUN
static final java.lang.String CARDINAL_NUMERAL
static final java.lang.String ORDINAL_NUMERAL
static final java.lang.String ADVERB
static final java.lang.String ADJECTIVE
static final java.lang.String INTERJECTION
static final java.lang.String UNDETERMINED
static final java.lang.String VERB
static final java.lang.String VERB_PAST
static final java.lang.String VERB_PAST_PARTICIPLE
static final java.lang.String VERB_PRESENT_PARTICIPLE
static final java.lang.String SYMBOL
static final java.lang.String PUNCTUATION
static final java.lang.String NONE
static final int TAG_INDEX
static final int WORDCLASS_INDEX
static final int MAJOR_WORDCLASS_INDEX
static final int GENERAL_TAG_NAME_INDEX
static final int DESCRIPTION_INDEX
void addTag(java.lang.String tag, java.lang.String wordClass, java.lang.String majorWordClass, java.lang.String lemmaWordClass, java.lang.String generalTagName, java.lang.String description)
tag
- Tag name.wordClass
- The word class.majorWordClass
- The major word class.lemmaWordClass
- The lemma word class.generalTagName
- The general tag name.description
- The description.void addPartOfSpeech(PartOfSpeech partOfSpeech)
partOfSpeech
- The part of speech to add.java.lang.String getSingularNounTag()
java.lang.String getPluralNounTag()
java.lang.String getPossessiveSingularNounTag()
java.lang.String getPossessivePluralNounTag()
java.lang.String getSingularProperNounTag()
java.lang.String getPluralProperNounTag()
java.lang.String getPossessiveSingularProperNounTag()
java.lang.String getPossessivePluralProperNounTag()
java.lang.String getCardinalNumberTag()
java.lang.String getOrdinalNumberTag()
java.lang.String getAdverbTag()
java.lang.String getAdjectiveTag()
java.lang.String getInterjectionTag()
java.lang.String getVerbTag()
java.lang.String getVerbPastTag()
java.lang.String getPastParticipleTag()
java.lang.String getPresentParticipleTag()
java.lang.String getSymbolTag()
java.lang.String getUndeterminedTag()
java.lang.String getForeignWordTag(java.lang.String language)
language
- The foreign language.java.lang.String getDescription(java.lang.String tag)
tag
- The part of speech tag.java.lang.String getWordClass(java.lang.String tag)
tag
- The part of speech tag.java.lang.String getMajorWordClass(java.lang.String tag)
tag
- The part of speech tag.java.lang.String getLemmaWordClass(java.lang.String tag)
tag
- The part of speech tag.java.lang.String getCorrespondingCommonNounTag(java.lang.String tag)
tag
- The part of speech tag.boolean isDeterminerTag(java.lang.String tag)
tag
- The part of speech tag.boolean isNounTag(java.lang.String tag)
tag
- The part of speech tag.boolean isSingularNounTag(java.lang.String tag)
tag
- The part of speech tag.boolean isProperNounTag(java.lang.String tag)
tag
- The part of speech tag.boolean isProperAdjectiveTag(java.lang.String tag)
tag
- The part of speech tag.boolean isVerbTag(java.lang.String tag)
tag
- The part of speech tag.boolean isPronounTag(java.lang.String tag)
tag
- The part of speech tag.boolean isPersonalPronounTag(java.lang.String tag)
tag
- The part of speech tag.boolean isForeignWordTag(java.lang.String tag)
tag
- The part of speech tag.boolean isNumberTag(java.lang.String tag)
tag
- The part of speech tag.boolean isSymbolTag(java.lang.String tag)
tag
- The part of speech tag.boolean isPunctuationTag(java.lang.String tag)
tag
- The part of speech tag.boolean isUndeterminedTag(java.lang.String tag)
tag
- Tag to check for being undetermined.boolean isTag(java.lang.String tag)
tag
- The part of speech tag.boolean isCompoundTag(java.lang.String tag)
tag
- The part of speech tag.boolean isInterjectionTag(java.lang.String tag)
tag
- The part of speech tagjava.lang.String getTagSeparator()
java.lang.String joinTags(java.lang.String[] tags, java.lang.String separator)
tags
- String array of part of speech tags.separator
- String to separate tags.java.lang.String joinTags(java.lang.String[] tags)
tags
- String array of part of speech tags.java.lang.String[] splitTag(java.lang.String tag)
tag
- The part of speech tag.int countTags(java.lang.String tag)
tag
- The part of speech tag.PartOfSpeech getTag(java.lang.String tag)
tag
- The tag name.java.util.List<PartOfSpeech> getTags()