edu.northwestern.at.utils.xml
Class TEITextExtractorHandler

java.lang.Object
  extended by org.xml.sax.helpers.DefaultHandler
      extended by edu.northwestern.at.utils.xml.TEITextExtractorHandler
All Implemented Interfaces:
org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler

public class TEITextExtractorHandler
extends org.xml.sax.helpers.DefaultHandler

SAX event handler to extract text from a TEI XML file.

Only the text between <text> and </text> tags is extracted. No effort is made to capture any of the original text division marked by the XML tags.


Field Summary
protected  java.lang.StringBuffer extractedText
          Holds the extracted text.
protected static boolean inText
          Track if we're in element.
 
Constructor Summary
TEITextExtractorHandler()
          Create text extractor handler.
 
Method Summary
 void characters(char[] ch, int start, int length)
          Handle character data.
 void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName)
          Handle end of an element.
 java.lang.String getExtractedText()
          Return extracted text.
 void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts)
          Handle start of an XML element.
 
Methods inherited from class org.xml.sax.helpers.DefaultHandler
endDocument, endPrefixMapping, error, fatalError, ignorableWhitespace, notationDecl, processingInstruction, resolveEntity, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

extractedText

protected java.lang.StringBuffer extractedText
Holds the extracted text.


inText

protected static boolean inText
Track if we're in element.

Constructor Detail

TEITextExtractorHandler

public TEITextExtractorHandler()
Create text extractor handler.

Method Detail

startElement

public void startElement(java.lang.String uri,
                         java.lang.String localName,
                         java.lang.String qName,
                         org.xml.sax.Attributes atts)
                  throws org.xml.sax.SAXException
Handle start of an XML element.

Specified by:
startElement in interface org.xml.sax.ContentHandler
Overrides:
startElement in class org.xml.sax.helpers.DefaultHandler
Parameters:
uri - The XML element's URI.
localName - The XML element's local name.
qName - The XML element's qname.
atts - The XML element's attributes.
Throws:
org.xml.sax.SAXException

endElement

public void endElement(java.lang.String uri,
                       java.lang.String localName,
                       java.lang.String qName)
                throws org.xml.sax.SAXException
Handle end of an element.

Specified by:
endElement in interface org.xml.sax.ContentHandler
Overrides:
endElement in class org.xml.sax.helpers.DefaultHandler
Parameters:
uri - The XML element's URI.
localName - The XML element's local name.
qName - The XML element's qname.
Throws:
org.xml.sax.SAXException

characters

public void characters(char[] ch,
                       int start,
                       int length)
                throws org.xml.sax.SAXException
Handle character data.

Specified by:
characters in interface org.xml.sax.ContentHandler
Overrides:
characters in class org.xml.sax.helpers.DefaultHandler
Parameters:
ch - Array of characters.
start - The starting position in the array.
length - The number of characters.
Throws:
org.xml.sax.SAXException - If there is an error.

getExtractedText

public java.lang.String getExtractedText()
Return extracted text.

Returns:
The extracted text.