MorphAdorner: Merging Annolex corrections with adorned TEI XML

Merging Annolex corrections with adorned TEI XML

AnnoLex is a collaborative data curation tool for use with Text Creation Partnership texts. Annolex allows for the identification and correction of incompletely or incorrectly transcribed words. It can also be used for the manual correction of algorithmically applied lemmatization and part-of-speech tagging. Annolex was developed by Craig Berry and Martin Mueller.

MergeAnnolexCorrectionsIntoAdornedXML merges corrections developed in Annolex back into the source adorned TEI XML files.

Usage:

mergeannolexcorrectionsintoadornedxml correctionsdirectory outputdirectory inputfiles

where

correctionsdirectory is the input directory with Annolex correction files in tabular format.
outputdirectory is the output directory for the corrected adorned TEI XML files.
inputfiles contains the input adorned XML files with which to merge the AnnoLex produced corrections. These must be in the base adorned format, not the simplified TEI P5 format.

The corrections file is a tab-separated utf-8 file containing the following columns.

Work ID.
Word ID.
Old spelling.
Corrected spelling.
Standard spelling.
Corrected lemmata.
Corrected parts of speech.
Operation: 1=update, 2=insert, 3=delete, 5=delete nearest gap.

The corrected spelling, lemmata, and parts of speech may all be empty when the operation is 3 (delete).

The value of the "ord" (word ordinal) attribute for each word is adjusted to account for inserted and deleted words. The value of the "reg" (standard spelling) and "tok" attributes (original token) are generated as needed for updated and inserted words.

Whitespace markers " " are added and deleted as needed when tokens are added or deleted. In general, most added punctuation and symbols do not require added whitespace markers. When tokens are deleted, sequences of "<c> </c><c> </c> ..." are compressed to a single "<c> </c>" entry.

	Home
	Welcome
	Announcements and News
	Announcements and news about changes to MorphAdorner
	Documentation
	Documentation for using MorphAdorner
	Download MorphAdorner
	Downloading and installing the MorphAdorner client and server software
	Glossary
	Glossary of MorphAdorner terms
	Helpful References
	Natural language processing references
	Licenses
	Licenses for MorphAdorner and Associated Software
	Server
	Online examples of MorphAdorner Server facilities.
	Talks
	Slides from talks about MorphAdorner.
	Tech Talk
	Technical information for programmers using MorphAdorner