Package edu.northwestern.at.morphadorner.tools.addcharacteroffsets

Create derived MorphAdorner files with character offsets to word tokens.

See: Description

Package edu.northwestern.at.morphadorner.tools.addcharacteroffsets Description

Create derived MorphAdorner files with character offsets to word tokens.

Usage:

java edu.northwestern.at.morphadorner.tools.addcharacteroffsets.AddCharacterOffsets adornedinput.xml adornedoutput.xml unadornedoutput.xml
adornedinput.xml Standard MorphAdorner adorned output file.
adornedoutput.xml Derived adorned file with character offsets added to tags.
unadornedoutput.xml Derived unadorned file whose word offsets are given in adornedoutput.xml file.

The derived adorned output file adornedoutput.xml adds a cof= attribute to each <w> tag. The cof= attribute specifies the character (not byte) offset of each word in the unadornedoutput.xml file. The latter file removes the <w> and <c> tags from the adorned input file and outputs the word and whitespace text as specified by the <w> and <c> tags.