XGParser (MorphAdorner)

java.lang.Object
- edu.northwestern.at.morphadorner.xgtagger.XGParser

```
public class XGParser
extends java.lang.Object
```
Parse XML document for morphological adornment.

Author:

Aude Garnier, Xavier Tannier

Field Summary

Fields
Modifier and Type	Field and Description
`(package private) java.util.List`	`adornedWordDataList` List of adorned word data entries.
`(package private) AdornedWordOutputter`	`adornerOutputter`
`(package private) boolean`	`boolDot`
`(package private) java.io.BufferedReader`	`brCurrent`
`(package private) static java.lang.String`	`FILE_SEPARATOR` File separator.
`(package private) UnicodeReader`	`frCurrent`
`(package private) java.util.Map<java.lang.Integer,XGPair>`	`hMap`
`(package private) java.util.Map<java.lang.String,java.lang.String>`	`hmAttributes`
`(package private) int`	`intCountNonBlanks`
`(package private) int`	`intCountTags`
`(package private) int`	`intCpt`
`(package private) int`	`intID`
`(package private) int`	`intLongWord`
`(package private) int`	`intStrWordIndex`
`(package private) int`	`intStrWordLength`
`(package private) int`	`nextAdornedWord` Next adorned word to process.
`(package private) org.w3c.dom.NamedNodeMap`	`nnmEntities`
`(package private) XGOptions`	`options`
`(package private) java.lang.StringBuffer`	`sbWord`
`(package private) java.util.Map<java.lang.Integer,java.lang.Integer>`	`splitWords` Map of multipart word IDs to # of parts.
`(package private) java.lang.String`	`strLine`
`(package private) java.lang.String`	`strWord`
`(package private) java.lang.String`	`surroundMarker` Surrounding sentence/phrase marker.
`(package private) int`	`surroundMarkerLength` Surround marker string length.
`(package private) java.lang.String`	`surroundMarkerTrim`
`(package private) int`	`wordNodesCreated` Number of word nodes created.

Constructor Summary

Constructors
Constructor and Description

XGParser(XGOptions options, org.w3c.dom.Document document)
Create parser.

Constructors
Constructor and Description
`XGParser(XGOptions options, org.w3c.dom.Document document)` Create parser.

Method Summary

Methods
Modifier and Type	Method and Description
`protected org.w3c.dom.Node`	`cloneEntityReference(org.w3c.dom.EntityReference er, org.w3c.dom.Document doc)` Clone a read-only EntityReference into a writable Node.
`protected static org.w3c.dom.Node`	`cloneNode(org.w3c.dom.Node node)` Clone a node and its sub-elements.
`protected int`	`countNonBlankCharacters(java.lang.String strString)` Count non-blank characters in a `String` and update the tag `HashMap`.
`protected int`	`createNewNode(org.w3c.dom.Document doc, org.w3c.dom.Node node, org.w3c.dom.Node nodeChild, java.lang.String strCurrentPath, java.lang.Integer integerTagNumber)` Create new document node.
`java.lang.StringBuffer`	`extractText(org.w3c.dom.Node node)` Extract text from `node`.
`static java.lang.Object[]`	`extractText(XGOptions options, org.w3c.dom.Document document)` Extract text from DOM document.
`protected void`	`getNextEntry()` Reads next entry of adorner and updates appropriate class variables.
`int`	`getNumberOfAdornedWords()` Get number of adorned words.
`int`	`getRunningWordID()` Get word ID.
`static boolean`	`isPunctuationAndNotGap(java.lang.String s)` Check if string is punctuation, but not a gap span.
`static java.util.Map<java.lang.Integer,java.lang.Integer>`	`mergeAdornments(XGOptions options, XGParser instance, org.w3c.dom.Document document, java.lang.String segmentName, AdornedWordOutputter outputter, TextInputter inputter)` Merged adornments with original XML text.
`org.w3c.dom.Document`	`modifyDOM(org.w3c.dom.Node node, org.w3c.dom.Document doc, java.lang.String strCurrentPath)` Modify `element` to add adornments and remove initial text node.
`protected int`	`read()` Reads a integer from the adorner.
`void`	`setRunningWordID(int runningWordID)` Set running word ID.
`static org.w3c.dom.Document`	`textToDOM(XGOptions options, java.lang.String xmlText)` Create DOM from XML text.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - options
```
XGOptions options
```
  - hMap
```
java.util.Map<java.lang.Integer,XGPair> hMap
```
  - hmAttributes
```
java.util.Map<java.lang.String,java.lang.String> hmAttributes
```
  - nnmEntities
```
org.w3c.dom.NamedNodeMap nnmEntities
```
  - boolDot
```
boolean boolDot
```
  - intCountNonBlanks
```
int intCountNonBlanks
```
  - intCountTags
```
int intCountTags
```
  - intCpt
```
int intCpt
```
  - strLine
```
java.lang.String strLine
```
  - sbWord
```
java.lang.StringBuffer sbWord
```
  - intStrWordIndex
```
int intStrWordIndex
```
  - intStrWordLength
```
int intStrWordLength
```
  - strWord
```
java.lang.String strWord
```
  - intLongWord
```
int intLongWord
```
  - intID
```
int intID
```
  - frCurrent
```
UnicodeReader frCurrent
```
  - brCurrent
```
java.io.BufferedReader brCurrent
```
  - adornerOutputter
```
AdornedWordOutputter adornerOutputter
```
  - nextAdornedWord
```
int nextAdornedWord
```
    Next adorned word to process.
  - adornedWordDataList
```
java.util.List adornedWordDataList
```
    List of adorned word data entries.
  - surroundMarker
```
java.lang.String surroundMarker
```
    Surrounding sentence/phrase marker.
  - surroundMarkerTrim
```
java.lang.String surroundMarkerTrim
```
  - surroundMarkerLength
```
int surroundMarkerLength
```
    Surround marker string length.
  - splitWords
```
java.util.Map<java.lang.Integer,java.lang.Integer> splitWords
```
    Map of multipart word IDs to # of parts.
    Records for each word split by soft or jump tags, the ID for that word and the number of parts into which it is split.
  - wordNodesCreated
```
int wordNodesCreated
```
    Number of word nodes created.
  - FILE_SEPARATOR
```
static final java.lang.String FILE_SEPARATOR
```
    File separator.
- Constructor Detail
  - XGParser
```
public XGParser(XGOptions options,
        org.w3c.dom.Document document)
```
    Create parser.
    
    Parameters:
    options - Options for processing.
    document - Document to process.
- Method Detail
  - setRunningWordID
```
public void setRunningWordID(int runningWordID)
```
    Set running word ID.
    
    Parameters:
    runningWordID - The running word ID.
  - getRunningWordID
```
public int getRunningWordID()
```
    Get word ID.
    
    Returns:
    The current running word ID.
  - getNumberOfAdornedWords
```
public int getNumberOfAdornedWords()
```
    Get number of adorned words.
    
    Returns:
    Number of adorned words.
  - read
```
protected int read()
            throws java.io.IOException,
                   java.io.FileNotFoundException
```
    Reads a integer from the adorner.
    
    Returns:
    The next int in the output stream.
    If this output is split into several files, handle multiple buffers.
    
    Throws:
    
    java.io.IOException
    
    java.io.FileNotFoundException
  - getNextEntry
```
protected void getNextEntry()
                     throws java.io.IOException,
                            java.io.FileNotFoundException
```
    Reads next entry of adorner and updates appropriate class variables.
    
    Throws:
    
    java.io.IOException
    
    java.io.FileNotFoundException
  - extractText
```
public java.lang.StringBuffer extractText(org.w3c.dom.Node node)
                                   throws java.io.IOException
```
    Extract text from node.
    
    Parameters:
    node - the Node to parse.
    
    Returns:
    A StringBuffer containing the element text, taking reading context into account.
    The algorithm used to parse children (soft, jump, hard tags) is the same as that in modifyDOM(org.w3c.dom.Node, org.w3c.dom.Document, java.lang.String).
    
    Throws:
    
    java.io.IOException
  - createNewNode
```
protected int createNewNode(org.w3c.dom.Document doc,
                org.w3c.dom.Node node,
                org.w3c.dom.Node nodeChild,
                java.lang.String strCurrentPath,
                java.lang.Integer integerTagNumber)
```
    Create new document node.
    
    Parameters:
    doc - The document we're processing.
    node - The current node we're processing.
    nodeChild - The child node we're processing.
    strCurrentPath - Current XML path to this node.
    integerTagNumber - Integer tag number for path.
    
    Returns:
    # of string word elements generated.
  - cloneNode
```
protected static org.w3c.dom.Node cloneNode(org.w3c.dom.Node node)
```
    Clone a node and its sub-elements.
    
    Parameters:
    node - The Node to clone
    
    Returns:
    The Node cloned.
  - cloneEntityReference
```
protected org.w3c.dom.Node cloneEntityReference(org.w3c.dom.EntityReference er,
                                    org.w3c.dom.Document doc)
```
    Clone a read-only EntityReference into a writable Node.
    
    Parameters:
    er - The EntityReference to clone.
    doc - The parent Document.
    
    Returns:
    A Node containing the same writable sub-elements than er .
  - modifyDOM
```
public org.w3c.dom.Document modifyDOM(org.w3c.dom.Node node,
                             org.w3c.dom.Document doc,
                             java.lang.String strCurrentPath)
                               throws org.w3c.dom.DOMException,
                                      java.io.IOException
```
    Modify element to add adornments and remove initial text node.
    
    Parameters:
    node - The Node to parse.
    doc - The Document to modify.
    strCurrentPath - The XPath or the last Node explored.
    
    Returns:
    Modified Document.
    The algorithm used to parse children (soft, jump, hard tags) is the same as used in extractText(org.w3c.dom.Node).
    
    Throws:
    
    org.w3c.dom.DOMException
    
    java.io.IOException
  - countNonBlankCharacters
```
protected int countNonBlankCharacters(java.lang.String strString)
                               throws java.io.IOException
```
    Count non-blank characters in a String and update the tag HashMap.
    
    Parameters:
    strString - The text to analyze.
    
    Returns:
    Number of non-blank characters in strString.
    strString should have all whitespace characters mapped to blanks before this method is called.
    
    Throws:
    
    java.io.IOException
  - extractText
```
public static java.lang.Object[] extractText(XGOptions options,
                             org.w3c.dom.Document document)
                                      throws java.io.IOException
```
    Extract text from DOM document.
    
    Parameters:
    options - The processing options.
    document - The document to process.
    
    Returns:
    Two element object array. result[ 0 ] = XGParser instance. result[ 1 ] = reading context text.
    
    Throws:
    
    java.io.IOException
  - mergeAdornments
```
public static java.util.Map<java.lang.Integer,java.lang.Integer> mergeAdornments(XGOptions options,
                                                                 XGParser instance,
                                                                 org.w3c.dom.Document document,
                                                                 java.lang.String segmentName,
                                                                 AdornedWordOutputter outputter,
                                                                 TextInputter inputter)
                                                                          throws java.io.IOException
```
    Merged adornments with original XML text.
    
    Parameters:
    options - XGTagger options.
    instance - XGParser instance.
    document - Document being processed.
    segmentName - Name of document segment being processed.
    outputter - Adorned word outputter.
    inputter - Text inputter.
    
    Returns:
    Map of (word id, # of word parts) for words split by soft or jump tags.
    
    Throws:
    
    java.io.IOException
  - textToDOM
```
public static org.w3c.dom.Document textToDOM(XGOptions options,
                             java.lang.String xmlText)
                                      throws java.io.IOException
```
    Create DOM from XML text.
    
    Parameters:
    options - The processing options.
    xmlText - The XML text.
    
    Returns:
    DOM for document.
    
    Throws:
    
    java.io.IOException
  - isPunctuationAndNotGap
```
public static boolean isPunctuationAndNotGap(java.lang.String s)
```
    Check if string is punctuation, but not a gap span.
    
    Parameters:
    s - String to check for being punctuation.
    
    Returns:
    true if string is punctuation and not gap span, false otherwise.

Class XGParser

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

options

hMap

hmAttributes

nnmEntities

boolDot

intCountNonBlanks

intCountTags

intCpt

strLine

sbWord

intStrWordIndex

intStrWordLength

strWord

intLongWord

intID

frCurrent

brCurrent

adornerOutputter

nextAdornedWord

adornedWordDataList

surroundMarker

surroundMarkerTrim

surroundMarkerLength

splitWords

wordNodesCreated

FILE_SEPARATOR

Constructor Detail

XGParser

Method Detail

setRunningWordID

getRunningWordID

getNumberOfAdornedWords

read

getNextEntry

extractText

createNewNode

cloneNode

cloneEntityReference

modifyDOM

countNonBlankCharacters

extractText

mergeAdornments

textToDOM

isPunctuationAndNotGap