public class XMLToTab
extends java.lang.Object
Usage:
java edu.northwestern.at.morphadorner.tools.xmltotab.XMLToTab input.xml output.tab
input.xml -- input XML file.
output.tab -- output tab-separated values file.
The attribute values for each "<w>" and "<pc>" element in the input XML file are extracted and output to a tab-separated values text file. An output line contains the following information corresponding to a single word "<w>" or "<pc>" element.
This tabular representation of an adorned XML text is useful for data checking purposes. The morphological attribute values for each word element appear as columns. The 80 characters (or so) of text on either side of the word allows you to focus on particular part of speech tags and pinpoint errors from the automatic adornment process. The tab separated values may also be used to construct spreadsheets or databases of the individual word information.
Constructor and Description |
---|
XMLToTab(java.lang.String[] args)
Supervises conversion of XML word elements to tabular form.
|
Modifier and Type | Method and Description |
---|---|
static void |
displayUsage()
Display brief program usage.
|
protected static java.lang.String |
fixPath(java.lang.String path)
Trim work ID and word number from word path.
|
static java.lang.String[] |
getKWIC(java.lang.String id,
int KWICWidth,
java.util.List<java.lang.String> idList,
AdornedXMLReader xmlReader)
Generate a KWIC line for a spelling.
|
static void |
main(java.lang.String[] args)
Main program.
|
public XMLToTab(java.lang.String[] args)
args
- Command line arguments.public static void main(java.lang.String[] args)
public static void displayUsage()
protected static java.lang.String fixPath(java.lang.String path)
path
- Full path containing work ID and word number.public static java.lang.String[] getKWIC(java.lang.String id, int KWICWidth, java.util.List<java.lang.String> idList, AdornedXMLReader xmlReader)
id
- Word ID the word for which to generate
a KWIC.KWICWidth
- Maximum width (in characters) of
KWIC text.idList
- List of word IDsxmlReader
- The XML Reader.