Northwestern University Information Technology
AddCharacterOffsets creates derived MorphAdorner files with character offsets to word tokens.
addcharacteroffsets adornedinput.xml adornedoutput.xml unadornedoutput.xml
|Standard MorphAdorner adorned output file.
|Derived adorned file with character offsets added to
|Derived unadorned file whose word offsets are given in adornedoutput.xml file.
The derived adorned output file adornedoutput.xml adds a cof= attribute to each <w> tag. The cof= attribute specifies the character (not byte) offset of each word in the unadornedoutput.xml file. The latter file removes the <w> and <c> tags from the adorned input file and outputs the word and whitespace text as specified by the <w> and <c> tags. (Note that cof= is not recognized by the TEI-Analytics scheme.)
The source code for AddCharacterOffsets is interesting in that it shows how to process an adorned file using regular expressions instead of a full XML parser.
|Announcements and News
|Announcements and news about changes to MorphAdorner
|Documentation for using MorphAdorner
|Downloading and installing the MorphAdorner client and server software
|Glossary of MorphAdorner terms
|Natural language processing references
|Licenses for MorphAdorner and Associated Software
|Online examples of MorphAdorner Server facilities.
|Slides from talks about MorphAdorner.
|Technical information for programmers using MorphAdorner
Academic Technologies and Research Services,
NU Library 2East, 1970 Campus Drive Evanston, IL 60208. |