NU
IT
Northwestern University Information Technology |
MorphAdorner V2.0 | Site Map |
AddCharacterOffsets creates derived MorphAdorner files with character offsets to word tokens.
Usage:
addcharacteroffsets adornedinput.xml adornedoutput.xml unadornedoutput.xml
where
adornedinput.xml | Standard MorphAdorner adorned output file. |
adornedoutput.xml | Derived adorned file with character offsets added to |
unadornedoutput.xml | Derived unadorned file whose word offsets are given in adornedoutput.xml file. |
The derived adorned output file adornedoutput.xml adds a cof= attribute to each <w> tag. The cof= attribute specifies the character (not byte) offset of each word in the unadornedoutput.xml file. The latter file removes the <w> and <c> tags from the adorned input file and outputs the word and whitespace text as specified by the <w> and <c> tags. (Note that cof= is not recognized by the TEI-Analytics scheme.)
The source code for AddCharacterOffsets is interesting in that it shows how to process an adorned file using regular expressions instead of a full XML parser.
![]() |
Home |
Welcome | |
![]() |
Announcements and News |
Announcements and news about changes to MorphAdorner | |
![]() |
Documentation |
Documentation for using MorphAdorner | |
![]() |
Download MorphAdorner |
Downloading and installing the MorphAdorner client and server software | |
![]() |
Glossary |
Glossary of MorphAdorner terms | |
![]() |
Helpful References |
Natural language processing references | |
![]() |
Licenses |
Licenses for MorphAdorner and Associated Software | |
![]() |
Server |
Online examples of MorphAdorner Server facilities. | |
![]() |
Talks |
Slides from talks about MorphAdorner. | |
![]() |
Tech Talk |
Technical information for programmers using MorphAdorner |
Academic Technologies and Research Services,
NU Library 2East, 1970 Campus Drive Evanston, IL 60208. |
Contact Us.
|