public class RelemmatizeFilter extends ExtendedXMLFilterImpl
Modifier and Type | Field and Description |
---|---|
protected java.lang.String |
lemmaSeparator
Lemma separator.
|
protected int |
lemmataChanged
Number of lemmata changed.
|
protected Lemmatizer |
lemmatizer
Lemmatizer.
|
protected NameStandardizer |
nameStandardizer
Name standardizer.
|
protected PartOfSpeechTags |
partOfSpeechTags
Part of speech tags.
|
protected SpellingMapper |
spellingMapper
Spelling mapper.
|
protected WordTokenizer |
spellingTokenizer
Spelling tokenizer.
|
protected int |
standardChanged
Number of standard spellings changed.
|
protected SpellingStandardizer |
standardizer
Spelling standardizer.
|
protected Lexicon |
wordLexicon
Word lexicon.
|
protected int |
wordsProcessed
Number of words processed.
|
Constructor and Description |
---|
RelemmatizeFilter(org.xml.sax.XMLReader reader,
Lexicon wordLexicon,
Lemmatizer lemmatizer,
NameStandardizer nameStandardizer,
SpellingStandardizer standardizer,
SpellingMapper spellingMapper)
Create adorned word info filter.
|
Modifier and Type | Method and Description |
---|---|
java.lang.String |
getLemma(java.lang.String spelling,
java.lang.String partOfSpeech)
Get lemma for a word.
|
int |
getLemmataChanged()
Return number of lemmata changed.
|
int |
getStandardChanged()
Return number of standard spellings changed.
|
protected java.lang.String |
getStandardizedSpelling(java.lang.String correctedSpelling,
java.lang.String partOfSpeech)
Get standardized spelling.
|
int |
getWordsProcessed()
Return number of words processed.
|
void |
startElement(java.lang.String uri,
java.lang.String localName,
java.lang.String qName,
org.xml.sax.Attributes atts)
Handle start of an XML element.
|
removeAttribute, setAttributeValue, setAttributeValue, setAttributeValue
characters, endDocument, endElement, endPrefixMapping, error, fatalError, getContentHandler, getDTDHandler, getEntityResolver, getErrorHandler, getFeature, getParent, getProperty, ignorableWhitespace, notationDecl, parse, parse, processingInstruction, resolveEntity, setContentHandler, setDocumentLocator, setDTDHandler, setEntityResolver, setErrorHandler, setFeature, setParent, setProperty, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning
protected Lexicon wordLexicon
protected Lemmatizer lemmatizer
protected NameStandardizer nameStandardizer
protected SpellingStandardizer standardizer
protected SpellingMapper spellingMapper
protected PartOfSpeechTags partOfSpeechTags
protected WordTokenizer spellingTokenizer
protected java.lang.String lemmaSeparator
protected int lemmataChanged
protected int standardChanged
protected int wordsProcessed
public RelemmatizeFilter(org.xml.sax.XMLReader reader, Lexicon wordLexicon, Lemmatizer lemmatizer, NameStandardizer nameStandardizer, SpellingStandardizer standardizer, SpellingMapper spellingMapper)
reader
- XML input reader to which this filter applies.public void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts) throws org.xml.sax.SAXException
startElement
in interface org.xml.sax.ContentHandler
startElement
in class org.xml.sax.helpers.XMLFilterImpl
uri
- The XML element's URI.localName
- The XML element's local name.qName
- The XML element's qname.atts
- The XML element's attributes.org.xml.sax.SAXException
public java.lang.String getLemma(java.lang.String spelling, java.lang.String partOfSpeech)
spelling
- The word spelling.partOfSpeech
- The part of speech.
On output, sets the lemma field of the adorned word We look in the word lexicon first for the lemma. If the lexicon does not contain the lemma, we use the lemmatizer.
protected java.lang.String getStandardizedSpelling(java.lang.String correctedSpelling, java.lang.String partOfSpeech)
correctedSpelling
- The spelling.partOfSpeech
- The part of speech tag.public int getLemmataChanged()
public int getStandardChanged()
public int getWordsProcessed()