See: Description
Class | Description |
---|---|
AddCharacterOffsets |
Create derived MorphAdorner files with character offsets to word tokens.
|
Create derived MorphAdorner files with character offsets to word tokens.
Usage:
java edu.northwestern.at.morphadorner.tools.addcharacteroffsets.AddCharacterOffsets adornedinput.xml adornedoutput.xml unadornedoutput.xml
adornedinput.xml | Standard MorphAdorner adorned output file. |
adornedoutput.xml | Derived adorned file with character offsets added to |
unadornedoutput.xml | Derived unadorned file whose word offsets are given in adornedoutput.xml file. |
The derived adorned output file adornedoutput.xml adds a cof= attribute to each <w> tag. The cof= attribute specifies the character (not byte) offset of each word in the unadornedoutput.xml file. The latter file removes the <w> and <c> tags from the adorned input file and outputs the word and whitespace text as specified by the <w> and <c> tags.