Northwestern University Information Technology
MorphAdorner Northwestern
Merging Annolex corrections with adorned TEI XML

AnnoLex is a collaborative data curation tool for use with Text Creation Partnership texts. Annolex allows for the identification and correction of incompletely or incorrectly transcribed words. It can also be used for the manual correction of algorithmically applied lemmatization and part-of-speech tagging. Annolex was developed by Craig Berry and Martin Mueller.

MergeAnnolexCorrectionsIntoAdornedXML merges corrections developed in Annolex back into the source adorned TEI XML files.


mergeannolexcorrectionsintoadornedxml correctionsdirectory outputdirectory inputfiles


  • correctionsdirectory is the input directory with Annolex correction files in tabular format.
  • outputdirectory is the output directory for the corrected adorned TEI XML files.
  • inputfiles contains the input adorned XML files with which to merge the AnnoLex produced corrections. These must be in the base adorned format, not the simplified TEI P5 format.

The corrections file is a tab-separated utf-8 file containing the following columns.

  1. Work ID.
  2. Word ID.
  3. Old spelling.
  4. Corrected spelling.
  5. Standard spelling.
  6. Corrected lemmata.
  7. Corrected parts of speech.
  8. Operation: 1=update, 2=insert, 3=delete, 5=delete nearest gap.

The corrected spelling, lemmata, and parts of speech may all be empty when the operation is 3 (delete).

The value of the "ord" (word ordinal) attribute for each word is adjusted to account for inserted and deleted words. The value of the "reg" (standard spelling) and "tok" attributes (original token) are generated as needed for updated and inserted words.

Whitespace markers " " are added and deleted as needed when tokens are added or deleted. In general, most added punctuation and symbols do not require added whitespace markers. When tokens are deleted, sequences of "<c> </c><c> </c> ..." are compressed to a single "<c> </c>" entry.

Announcements and News
Download MorphAdorner
Helpful References
Tech Talk