MorphAdorner: Stripping Word Attributes

Stripping Word Attributes

StripWordAttributes creates a derived MorphAdorner XML file with word elements stripped of attributes.

Usage:

stripwordattributes input.xml output.xml output.tab [/[no]id] [/[no]trim]

where

input.xml	Input MorphAdorned xml file.
output.xml	Derived adorned file with word element attributes stripped.
output.tab	Tab delimited file of word element attribute values.
/id or /noid	Optional parameter indicating xml:id should be left attached to each word (<w>) element. Default is /noid which removes the xml:id attribute and value.
/trim or /notrim	Optional parameter indicating whether whitespace should be trimmed from the start and end of each XML text line. Default is /notrim, which leaves the original whitespace intact.

The derived adorned output file output.xml has all attributes stripped from each <w> tag.

The attribute values for each "<w>" element in the input.xml file are extracted and output to the tab-separated values output.tab file. The order of the attribute lines matches the order of appearance of the <w> elements in the XML output file. When /id is specified the xml:id value in each <w> element in output.xml can be matched with the corresponding xml:id value in output.tab .

The first line in output.tab contains the attribute names for each column. Each subsequent line in the output.tab file contains at least the following information corresponding to a single word "<w>" element. Some adorned files may add extra word attributes, resulting in more columns.

xml:id -- the permanent word ID.
eos -- the end of sentence flag (1 if word ends a sentence, 0 otherwise)
lem -- the lemma.
ord -- the word ordinal within the text (starts at 1)
part -- the word part flag. "N" for a word which is not split; "I" for the first part of a split word; "M" for the middle parts of a split word; and "F" for the final part of a split word.
pos -- the part of speech.
reg -- the standard spelling.
spe -- the corrected original spelling.
tok -- The original token.

	Home
	Welcome
	Announcements and News
	Announcements and news about changes to MorphAdorner
	Documentation
	Documentation for using MorphAdorner
	Download MorphAdorner
	Downloading and installing the MorphAdorner client and server software
	Glossary
	Glossary of MorphAdorner terms
	Helpful References
	Natural language processing references
	Licenses
	Licenses for MorphAdorner and Associated Software
	Server
	Online examples of MorphAdorner Server facilities.
	Talks
	Slides from talks about MorphAdorner.
	Tech Talk
	Technical information for programmers using MorphAdorner