|
NU
IT
Northwestern University Information Technology |
| MorphAdorner V2.0 | Site Map |
AddCharacterOffsets creates derived MorphAdorner files with character offsets to word tokens.
Usage:
addcharacteroffsets adornedinput.xml adornedoutput.xml unadornedoutput.xml
where
| adornedinput.xml | Standard MorphAdorner adorned output file. |
| adornedoutput.xml | Derived adorned file with character offsets added to |
| unadornedoutput.xml | Derived unadorned file whose word offsets are given in adornedoutput.xml file. |
The derived adorned output file adornedoutput.xml adds a cof= attribute to each <w> tag. The cof= attribute specifies the character (not byte) offset of each word in the unadornedoutput.xml file. The latter file removes the <w> and <c> tags from the adorned input file and outputs the word and whitespace text as specified by the <w> and <c> tags. (Note that cof= is not recognized by the TEI-Analytics scheme.)
The source code for AddCharacterOffsets is interesting in that it shows how to process an adorned file using regular expressions instead of a full XML parser.
| Home | |
| Welcome | |
| Announcements and News | |
| Announcements and news about changes to MorphAdorner | |
| Documentation | |
| Documentation for using MorphAdorner | |
| Download MorphAdorner | |
| Downloading and installing the MorphAdorner client and server software | |
| Glossary | |
| Glossary of MorphAdorner terms | |
| Helpful References | |
| Natural language processing references | |
| Licenses | |
| Licenses for MorphAdorner and Associated Software | |
| Server | |
| Online examples of MorphAdorner Server facilities. | |
| Talks | |
| Slides from talks about MorphAdorner. | |
| Tech Talk | |
| Technical information for programmers using MorphAdorner |
|
Academic Technologies and Research Services,
NU Library 2East, 1970 Campus Drive Evanston, IL 60208. |
Contact Us.
|