See: Description
Class | Description |
---|---|
StripWordAttributes |
Create derived MorphAdorner file with word elements stripped of attributes.
|
Create derived MorphAdorner file with word elements stripped of attributes.
Usage:
java edu.northwestern.at.morphadorner.tools.stripwordattributes.StripWordAttributes input.xml output.xml output.tab [/[no]id] [/[no]trim]
input.xml | Input MorphAdornerd xml file. |
output.xml | Derived adorned file with word element attributes stripped. |
output.tab | Tab delimited file of word element attribute values. |
/id or /noid | Optional parameter indicating xml:id should be left attached to each word (<w>) element. Default is /noid which removes the xml:id attribute and value. |
/trim or /notrim | Optional parameter indicating whether whitespace should be trimmed from the start and end of each XML text line. Default is /notrim, which leaves the original whitespace intact. |
The derived adorned output file "output.xml" has all attributes stripped from each <w> tag.
The attribute values for each "<w>" element in the input.xml file are extracted and output to the tab-separated values output.tab file. The order of the attribute lines matches the order of appearance of the <w> elements in the XML output file. When /id is used, the xml:id value in each <w> element in output.xml can be matched with the corresponding xml:id value in output.tab .
The first line in output.tab contains the attribute names for each column. Each subsequent line in the output.tab file contains at least the following information corresponding to a single word "<w>" element. Some adorned files may add extra word attributes, resulting in more columns.