public class AddCharacterOffsets
extends java.lang.Object
Usage:
java edu.northwestern.at.morphadorner.tools.addcharacteroffsets.AddCharacterOffsets adornedinput.xml adornedoutput.xml unadornedoutput.xml
adornedinput.xml | Standard MorphAdorner adorned output file. |
adornedoutput.xml | Derived adorned file with character offsets added to |
unadornedoutput.xml | Derived unadorned file whose word offsets are given in adornedoutput.xml file. |
The derived adorned output file adornedoutput.xml adds a cof= attribute to each <w> tag. The cof= attribute specifies the character (not byte) offset of each word in the unadornedoutput.xml file. The latter file removes the <w> and <c> tags from the adorned input file and outputs the word and whitespace text as specified by the <w> and <c> tags.
Modifier and Type | Field and Description |
---|---|
protected static int |
ATTRS |
protected static int |
CDATA |
protected static int |
CLEFT
Matcher groups for c.
|
protected static java.lang.String |
cPattern
|
protected static PatternReplacer |
creplacer
|
protected static int |
CRIGHT |
protected static int |
LEFT
Matcher groups for w.
|
protected static java.lang.String |
LINE_SEPARATOR
Line separator.
|
protected static int |
MAXLINEWIDTH
Maximum line width.
|
protected static int |
RIGHT |
protected static int |
WORD |
protected static java.lang.String |
wPattern
|
protected static PatternReplacer |
wreplacer
|
Constructor and Description |
---|
AddCharacterOffsets(java.lang.String[] args)
Create derived adorned files with character offset attributes.
|
Modifier and Type | Method and Description |
---|---|
static void |
displayUsage()
Display program usage.
|
static void |
main(java.lang.String[] args)
Main program.
|
protected static final java.lang.String LINE_SEPARATOR
protected static java.lang.String wPattern
protected static PatternReplacer wreplacer
protected static final int LEFT
protected static final int ATTRS
protected static final int WORD
protected static final int RIGHT
protected static java.lang.String cPattern
protected static final int CLEFT
protected static final int CDATA
protected static final int CRIGHT
protected static PatternReplacer creplacer
protected static final int MAXLINEWIDTH