|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectorg.xml.sax.helpers.XMLFilterImpl
edu.northwestern.at.utils.xml.ExtendedXMLFilterImpl
edu.northwestern.at.morphadorner.tools.relemmatize.RelemmatizeFilter
public class RelemmatizeFilter
Filter to update standard spellings and lemmata in adorned file.
| Field Summary | |
|---|---|
protected java.lang.String |
lemmaSeparator
Lemma separator. |
protected int |
lemmataChanged
Number of lemmata changed. |
protected Lemmatizer |
lemmatizer
Lemmatizer. |
protected NameStandardizer |
nameStandardizer
Name standardizer. |
protected PartOfSpeechTags |
partOfSpeechTags
Part of speech tags. |
protected SpellingMapper |
spellingMapper
Spelling mapper. |
protected WordTokenizer |
spellingTokenizer
Spelling tokenizer. |
protected int |
standardChanged
Number of standard spellings changed. |
protected SpellingStandardizer |
standardizer
Spelling standardizer. |
protected Lexicon |
wordLexicon
Word lexicon. |
protected int |
wordsProcessed
Number of words processed. |
| Constructor Summary | |
|---|---|
RelemmatizeFilter(org.xml.sax.XMLReader reader,
Lexicon wordLexicon,
Lemmatizer lemmatizer,
NameStandardizer nameStandardizer,
SpellingStandardizer standardizer,
SpellingMapper spellingMapper)
Create adorned word info filter. |
|
| Method Summary | |
|---|---|
java.lang.String |
getLemma(java.lang.String spelling,
java.lang.String partOfSpeech)
Get lemma for a word. |
int |
getLemmataChanged()
Return number of lemmata changed. |
int |
getStandardChanged()
Return number of standard spellings changed. |
protected java.lang.String |
getStandardizedSpelling(java.lang.String correctedSpelling,
java.lang.String partOfSpeech)
Get standardized spelling. |
int |
getWordsProcessed()
Return number of words processed. |
void |
startElement(java.lang.String uri,
java.lang.String localName,
java.lang.String qName,
org.xml.sax.Attributes atts)
Handle start of an XML element. |
| Methods inherited from class edu.northwestern.at.utils.xml.ExtendedXMLFilterImpl |
|---|
removeAttribute, setAttributeValue, setAttributeValue, setAttributeValue |
| Methods inherited from class org.xml.sax.helpers.XMLFilterImpl |
|---|
characters, endDocument, endElement, endPrefixMapping, error, fatalError, getContentHandler, getDTDHandler, getEntityResolver, getErrorHandler, getFeature, getParent, getProperty, ignorableWhitespace, notationDecl, parse, parse, processingInstruction, resolveEntity, setContentHandler, setDocumentLocator, setDTDHandler, setEntityResolver, setErrorHandler, setFeature, setParent, setProperty, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
protected Lexicon wordLexicon
protected Lemmatizer lemmatizer
protected NameStandardizer nameStandardizer
protected SpellingStandardizer standardizer
protected SpellingMapper spellingMapper
protected PartOfSpeechTags partOfSpeechTags
protected WordTokenizer spellingTokenizer
protected java.lang.String lemmaSeparator
protected int lemmataChanged
protected int standardChanged
protected int wordsProcessed
| Constructor Detail |
|---|
public RelemmatizeFilter(org.xml.sax.XMLReader reader,
Lexicon wordLexicon,
Lemmatizer lemmatizer,
NameStandardizer nameStandardizer,
SpellingStandardizer standardizer,
SpellingMapper spellingMapper)
reader - XML input reader to which this filter applies.| Method Detail |
|---|
public void startElement(java.lang.String uri,
java.lang.String localName,
java.lang.String qName,
org.xml.sax.Attributes atts)
throws org.xml.sax.SAXException
startElement in interface org.xml.sax.ContentHandlerstartElement in class org.xml.sax.helpers.XMLFilterImpluri - The XML element's URI.localName - The XML element's local name.qName - The XML element's qname.atts - The XML element's attributes.
org.xml.sax.SAXException
public java.lang.String getLemma(java.lang.String spelling,
java.lang.String partOfSpeech)
spelling - The word spelling.partOfSpeech - The part of speech.
On output, sets the lemma field of the adorned word We look in the word lexicon first for the lemma. If the lexicon does not contain the lemma, we use the lemmatizer.
protected java.lang.String getStandardizedSpelling(java.lang.String correctedSpelling,
java.lang.String partOfSpeech)
correctedSpelling - The spelling.partOfSpeech - The part of speech tag.
public int getLemmataChanged()
public int getStandardChanged()
public int getWordsProcessed()
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||