edu.northwestern.at.morphadorner
Class PseudoPageAdderFilter

java.lang.Object
  extended by org.xml.sax.helpers.XMLFilterImpl
      extended by edu.northwestern.at.utils.xml.ExtendedXMLFilterImpl
          extended by edu.northwestern.at.morphadorner.PseudoPageAdderFilter
All Implemented Interfaces:
org.xml.sax.ContentHandler, org.xml.sax.DTDHandler, org.xml.sax.EntityResolver, org.xml.sax.ErrorHandler, org.xml.sax.XMLFilter, org.xml.sax.XMLReader

public class PseudoPageAdderFilter
extends ExtendedXMLFilterImpl

Filter to add pseudopage milestones to an adorned file.


Field Summary
protected  QueueStack<org.xml.sax.Attributes> attrStack
          Element attributes stack.
protected  QueueStack<java.lang.String> divStack
          Div tag stack.
protected  java.util.Set<java.lang.String> pseudoPageContainerDivTypes
          Pseudo-page ending div types.
protected  int pseudoPageCount
          Current pseudo page count.
protected  int pseudoPageSize
          Page size in number of tokens.
protected  boolean pseudoPageStarted
          True if pseudo page started.
protected  int pseudoPageWordCount
          Current pseudo page word count.
protected  java.util.List<java.lang.String> tagList
          List of tags for determining node ancestry of each word.
 
Constructor Summary
PseudoPageAdderFilter(org.xml.sax.XMLReader reader, int pseudoPageSize, java.lang.String pageEndingDivTypes)
          Create adorned word info filter.
 
Method Summary
 void characters(char[] ch, int start, int length)
          Handle character data.
 PendingElement createPseudoPageElement(java.lang.String uri, boolean forcedEmit, boolean start, java.lang.String path)
          Create a pseudo page milestone.
 void emitPseudoPageElement(PendingElement pseudoPageElement)
          Emit a pseudo page milestone.
 void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName)
          Handle end of an element.
 void ignorableWhitespace(char[] ch, int start, int length)
          Handle whitespace.
 void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts)
          Handle start of an XML element.
 
Methods inherited from class edu.northwestern.at.utils.xml.ExtendedXMLFilterImpl
removeAttribute, setAttributeValue, setAttributeValue, setAttributeValue
 
Methods inherited from class org.xml.sax.helpers.XMLFilterImpl
endDocument, endPrefixMapping, error, fatalError, getContentHandler, getDTDHandler, getEntityResolver, getErrorHandler, getFeature, getParent, getProperty, notationDecl, parse, parse, processingInstruction, resolveEntity, setContentHandler, setDocumentLocator, setDTDHandler, setEntityResolver, setErrorHandler, setFeature, setParent, setProperty, skippedEntity, startDocument, startPrefixMapping, unparsedEntityDecl, warning
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

tagList

protected java.util.List<java.lang.String> tagList
List of tags for determining node ancestry of each word.


pseudoPageSize

protected int pseudoPageSize
Page size in number of tokens.


pseudoPageCount

protected int pseudoPageCount
Current pseudo page count.


pseudoPageWordCount

protected int pseudoPageWordCount
Current pseudo page word count.


pseudoPageStarted

protected boolean pseudoPageStarted
True if pseudo page started.


divStack

protected QueueStack<java.lang.String> divStack
Div tag stack.


attrStack

protected QueueStack<org.xml.sax.Attributes> attrStack
Element attributes stack.


pseudoPageContainerDivTypes

protected java.util.Set<java.lang.String> pseudoPageContainerDivTypes
Pseudo-page ending div types.

Constructor Detail

PseudoPageAdderFilter

public PseudoPageAdderFilter(org.xml.sax.XMLReader reader,
                             int pseudoPageSize,
                             java.lang.String pageEndingDivTypes)
Create adorned word info filter.

Parameters:
reader - XML input reader to which this filter applies.
pseudoPageSize - Number of words in a pseudopage.
pageEndingDivTypes - div types that end a pseudopage.
Method Detail

startElement

public void startElement(java.lang.String uri,
                         java.lang.String localName,
                         java.lang.String qName,
                         org.xml.sax.Attributes atts)
                  throws org.xml.sax.SAXException
Handle start of an XML element.

Specified by:
startElement in interface org.xml.sax.ContentHandler
Overrides:
startElement in class org.xml.sax.helpers.XMLFilterImpl
Parameters:
uri - The XML element's URI.
localName - The XML element's local name.
qName - The XML element's qname.
atts - The XML element's attributes.
Throws:
org.xml.sax.SAXException

characters

public void characters(char[] ch,
                       int start,
                       int length)
                throws org.xml.sax.SAXException
Handle character data.

Specified by:
characters in interface org.xml.sax.ContentHandler
Overrides:
characters in class org.xml.sax.helpers.XMLFilterImpl
Parameters:
ch - Array of characters.
start - The starting position in the array.
length - The number of characters.
Throws:
org.xml.sax.SAXException - If there is an error.

ignorableWhitespace

public void ignorableWhitespace(char[] ch,
                                int start,
                                int length)
                         throws org.xml.sax.SAXException
Handle whitespace.

Specified by:
ignorableWhitespace in interface org.xml.sax.ContentHandler
Overrides:
ignorableWhitespace in class org.xml.sax.helpers.XMLFilterImpl
Parameters:
ch - Array of characters.
start - The starting position in the array.
length - The number of characters.
Throws:
org.xml.sax.SAXException - If there is an error.

endElement

public void endElement(java.lang.String uri,
                       java.lang.String localName,
                       java.lang.String qName)
                throws org.xml.sax.SAXException
Handle end of an element.

Specified by:
endElement in interface org.xml.sax.ContentHandler
Overrides:
endElement in class org.xml.sax.helpers.XMLFilterImpl
Parameters:
uri - The XML element's URI.
localName - The XML element's local name.
qName - The XML element's qname.
Throws:
org.xml.sax.SAXException

createPseudoPageElement

public PendingElement createPseudoPageElement(java.lang.String uri,
                                              boolean forcedEmit,
                                              boolean start,
                                              java.lang.String path)
Create a pseudo page milestone.

Parameters:
uri - Element URI.
forcedEmit - Emit pseudo page milestone even if not enough words accumulated, as long as at least one word in current block.
start - true if starting milestone, false if ending.
path - Path attribute. May be null.
Returns:
The pseudo page element.

emitPseudoPageElement

public void emitPseudoPageElement(PendingElement pseudoPageElement)
Emit a pseudo page milestone.

Parameters:
pseudoPageElement - The pseudo page element to emit.