NU
IT
Northwestern University Information Technology |
MorphAdorner V2.0 | Site Map |
AdornedToSimpleTEIP5 converts a base-level MorphAdorner file to a simpler, more TEI P5-like format.
Usage:
adornedtosimpleteip5 outputdirectory [usereg|usechoice] interpgrp.xml goodfiles.txt badfiles.txt adorned1.xml adorned2.xml ...
where
AdornedToSimpleTEIP5 converts the base form of an adorned TEI file, which adds custom attributes to word <w> elements, to a simpler more TEI P5 compatible format as follows.
In standard TEI P5 you cannot store a standardized spelling in a reg attribute. One approach is to use a combination of <choice>, <orig>, and <reg> elements to make each <w> element carry its part of a double stream of original and standardized spellings, as in this adorned encoding of "wylle anone" from an early 16th century text:
<w xml:id ="someid1" lemma="will" ana="#vmb"> <choice> <orig>wylle</orig> <reg>will</reg> </choice> </w> <w xml:id ="someid2" lemma="anon" ana="#av">> <choice> <orig>anone</orig> <reg>anon</reg> </choice> </w>
Alternatively, you can customize P5 and restore a reg attribute that lets you encode the same phenomena in a manner that programmers -- and in particular programmers with limited skills -- are likely to find more intuitive and economical:
<w xml:id ="someid1" lemma="will" reg= "will" ana="#vmb">wylle</w> <w xml:id ="someid2" lemma="anon" reg ="anon" ana="#av">anone</w>
For many purposes using an attribute is preferable to a choice element because the attribute leaves the token sequence undisturbed, and the added attribute value can be stored in the standard MorphAdorner change log format.
AdornedToSimpleTEIP5 allows you to use either of these two approaches.
Important: Many other MorphAdorner utilities do not yet work properly with simplified adorned texts created using the <choice> structure.
Strictly speaking, a TEI interpGrp element should be added to each TEI XML output file to specify the definitions for the parts of speech used. The MorphAdorner release materials include a nuposinterpgrp.xml file in the release data/ directory which defines an interpGrp for the NUPos tag set. This file can be specified as the value of AdornedToSimpleTEIP5's interpgrp.xml parameter.
Home | |
Welcome | |
Announcements and News | |
Announcements and news about changes to MorphAdorner | |
Documentation | |
Documentation for using MorphAdorner | |
Download MorphAdorner | |
Downloading and installing the MorphAdorner client and server software | |
Glossary | |
Glossary of MorphAdorner terms | |
Helpful References | |
Natural language processing references | |
Licenses | |
Licenses for MorphAdorner and Associated Software | |
Server | |
Online examples of MorphAdorner Server facilities. | |
Talks | |
Slides from talks about MorphAdorner. | |
Tech Talk | |
Technical information for programmers using MorphAdorner |
Academic Technologies and Research Services,
NU Library 2East, 1970 Campus Drive Evanston, IL 60208. |
Contact Us.
|