NU
IT
Northwestern University Information Technology |
MorphAdorner V2.0 | Site Map |
AddCharacterOffsets creates derived MorphAdorner files with character offsets to word tokens.
Usage:
addcharacteroffsets adornedinput.xml adornedoutput.xml unadornedoutput.xml
where
adornedinput.xml | Standard MorphAdorner adorned output file. |
adornedoutput.xml | Derived adorned file with character offsets added to |
unadornedoutput.xml | Derived unadorned file whose word offsets are given in adornedoutput.xml file. |
The derived adorned output file adornedoutput.xml adds a cof= attribute to each <w> tag. The cof= attribute specifies the character (not byte) offset of each word in the unadornedoutput.xml file. The latter file removes the <w> and <c> tags from the adorned input file and outputs the word and whitespace text as specified by the <w> and <c> tags. (Note that cof= is not recognized by the TEI-Analytics scheme.)
The source code for AddCharacterOffsets is interesting in that it shows how to process an adorned file using regular expressions instead of a full XML parser.
Home | |
Welcome | |
Announcements and News | |
Announcements and news about changes to MorphAdorner | |
Documentation | |
Documentation for using MorphAdorner | |
Download MorphAdorner | |
Downloading and installing the MorphAdorner client and server software | |
Glossary | |
Glossary of MorphAdorner terms | |
Helpful References | |
Natural language processing references | |
Licenses | |
Licenses for MorphAdorner and Associated Software | |
Server | |
Online examples of MorphAdorner Server facilities. | |
Talks | |
Slides from talks about MorphAdorner. | |
Tech Talk | |
Technical information for programmers using MorphAdorner |
Academic Technologies and Research Services,
NU Library 2East, 1970 Campus Drive Evanston, IL 60208. |
Contact Us.
|