public class TEITextExtractorHandler
extends org.xml.sax.helpers.DefaultHandler
Only the text between <text> and </text> tags is extracted. No effort is made to capture any of the original text division marked by the XML tags.
Modifier and Type | Field and Description |
---|---|
protected java.lang.StringBuffer |
extractedText
Holds the extracted text.
|
protected static boolean |
inText
Track if we're in
|
Constructor and Description |
---|
TEITextExtractorHandler()
Create text extractor handler.
|
Modifier and Type | Method and Description |
---|---|
void |
characters(char[] ch,
int start,
int length)
Handle character data.
|
void |
endElement(java.lang.String uri,
java.lang.String localName,
java.lang.String qName)
Handle end of an element.
|
java.lang.String |
getExtractedText()
Return extracted text.
|
void |
startElement(java.lang.String uri,
java.lang.String localName,
java.lang.String qName,
org.xml.sax.Attributes atts)
Handle start of an XML element.
|
protected java.lang.StringBuffer extractedText
protected static boolean inText
public TEITextExtractorHandler()
public void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts) throws org.xml.sax.SAXException
startElement
in interface org.xml.sax.ContentHandler
startElement
in class org.xml.sax.helpers.DefaultHandler
uri
- The XML element's URI.localName
- The XML element's local name.qName
- The XML element's qname.atts
- The XML element's attributes.org.xml.sax.SAXException
public void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName) throws org.xml.sax.SAXException
endElement
in interface org.xml.sax.ContentHandler
endElement
in class org.xml.sax.helpers.DefaultHandler
uri
- The XML element's URI.localName
- The XML element's local name.qName
- The XML element's qname.org.xml.sax.SAXException
public void characters(char[] ch, int start, int length) throws org.xml.sax.SAXException
characters
in interface org.xml.sax.ContentHandler
characters
in class org.xml.sax.helpers.DefaultHandler
ch
- Array of characters.start
- The starting position in the array.length
- The number of characters.org.xml.sax.SAXException
- If there is an error.public java.lang.String getExtractedText()