public class AdornedXMLReader
extends java.lang.Object
Modifier and Type | Class and Description |
---|---|
class |
AdornedXMLReader.MySentenceMelder
Custom sentence melder.
|
Modifier and Type | Field and Description |
---|---|
protected ExtendedAdornedWordFilter |
wordInfoFilter
Adorned word information filter.
|
Constructor and Description |
---|
AdornedXMLReader(java.lang.String xmlInputFileName)
Create adorned XML reader.
|
Modifier and Type | Method and Description |
---|---|
java.util.List<java.lang.String> |
findWordsByMatchingLeadingPath(java.lang.String pattern)
Find words whose paths start with a given string.
|
java.util.List<java.lang.String> |
findWordsByMatchingPath(java.lang.String pattern)
Find words matching a specified path regular expression pattern.
|
java.lang.String |
generateXML(java.lang.String startingWordID,
java.lang.String endingWordID)
Generate XML for selected word IDs.
|
java.util.List<java.lang.String> |
getAdornedWordIDs()
Return list of adorned word IDs.
|
int |
getAdornedWordIndexByID(java.lang.String id)
Get index for a word ID.
|
ExtendedAdornedWord |
getExtendedAdornedWord(int index)
Get adorned word information for a word index.
|
ExtendedAdornedWord |
getExtendedAdornedWord(java.lang.String id)
Get adorned word information for a word ID.
|
java.util.List<java.lang.String> |
getRelatedSplitWordIDs(java.lang.String wordID)
Get related adorned word IDs for a word ID of a split word.
|
java.util.List<ExtendedAdornedWord> |
getRelatedSplitWords(ExtendedAdornedWord adornedWordInfo)
Get related adorned words for split words.
|
java.util.List<java.lang.String> |
getSelectedWordIDs(java.lang.String startingWordID,
java.lang.String endingWordID)
Get list of selected word IDs from specified ID range.
|
java.util.List<java.util.List<ExtendedAdornedWord>> |
getSentences()
Get adorned words as a list of sentences.
|
java.util.List<java.lang.String> |
getSiblingWordIDs(java.lang.String wordID)
Get sibling words.
|
protected void |
outputTag(java.lang.String tag,
boolean openingTag,
AdornedXMLReader.MySentenceMelder melder,
XMLTagClassifier tagClassifier)
Output an XML tag.
|
protected void |
readXML(java.lang.String xmlInputFileName)
Reads adorned XML.
|
java.lang.String[] |
splitPath(java.lang.String path)
Split word path into separate tags.
|
java.lang.String[] |
splitPathFull(java.lang.String path)
Split word path into separate tags.
|
java.lang.String |
trimTag(java.lang.String tag)
Trim tag number from XML tag.
|
protected ExtendedAdornedWordFilter wordInfoFilter
public AdornedXMLReader(java.lang.String xmlInputFileName) throws org.xml.sax.SAXException, java.io.IOException
xmlInputFileName
- Input XML file name.org.xml.sax.SAXException
java.io.IOException
protected void readXML(java.lang.String xmlInputFileName) throws org.xml.sax.SAXException, java.io.IOException
xmlInputFileName
- XML input file name.org.xml.sax.SAXException
java.io.IOException
public java.util.List<java.lang.String> getAdornedWordIDs()
public java.util.List<java.util.List<ExtendedAdornedWord>> getSentences()
This method tries to return sentences in as close to their order of appearance in the text as possible. Sentences from intrusive jump tags will generally appear after the text section into which they intrude, and so may be dislodged an arbitrary distance from their actual position in the text.
public ExtendedAdornedWord getExtendedAdornedWord(java.lang.String id)
id
- The String word ID.public ExtendedAdornedWord getExtendedAdornedWord(int index)
index
- The word index.public java.util.List<java.lang.String> getRelatedSplitWordIDs(java.lang.String wordID)
wordID
- Word ID for which related IDs are wanted.Related word IDs are the word IDs for the other parts of a split word. The returned list includes the given wordID.
For unsplit words, the single given wordID is returned in the list.
Null is returned when the wordID does not exist.
public java.util.List<ExtendedAdornedWord> getRelatedSplitWords(ExtendedAdornedWord adornedWordInfo)
adornedWordInfo
- Adorned word for which related words
are wanted.Related words are those corresponding to the parts of a split word. The returned list includes the given word.
For unsplit words, the single given adorned word is returned in the list.
public int getAdornedWordIndexByID(java.lang.String id)
id
- The String word ID.public java.util.List<java.lang.String> getSiblingWordIDs(java.lang.String wordID)
wordID
- The word ID of the word for which to find siblings.Sibling words have the same parent hard or jump tag.
public java.util.List<java.lang.String> findWordsByMatchingLeadingPath(java.lang.String pattern)
pattern
- The pattern to match.public java.util.List<java.lang.String> findWordsByMatchingPath(java.lang.String pattern)
pattern
- The regular expression pattern to match.public java.lang.String trimTag(java.lang.String tag)
tag
- XML tag to trim.public java.lang.String[] splitPathFull(java.lang.String path)
path
- The word path.public java.lang.String[] splitPath(java.lang.String path)
path
- The word path.public java.util.List<java.lang.String> getSelectedWordIDs(java.lang.String startingWordID, java.lang.String endingWordID)
startingWordID
- Starting word ID.endingWordID
- Ending word ID.public java.lang.String generateXML(java.lang.String startingWordID, java.lang.String endingWordID)
startingWordID
- Starting word ID.endingWordID
- Ending word ID.protected void outputTag(java.lang.String tag, boolean openingTag, AdornedXMLReader.MySentenceMelder melder, XMLTagClassifier tagClassifier)
tag
- The XML tag to output.openingTag
- True to generate opening tag, false otherwise.melder
- XML sentence melder.tagClassifier
- XML tag classifier.