NU IT
Northwestern University Information Technology
MorphAdorner Northwestern
 
Welcome

This is MorphAdorner v2.0 initially released in September 2013. The online documentation is complete but needs some further edits. A draft of the printable documentation is now available in PDF, EPUB, and MOBI formats.

MorphAdorner is a Java command-line program which acts as a pipeline manager for processes performing morphological adornment of words in a text. We use the term "adornment" in preference to terms such as "annotation" or "tagging" which carry too many alternative and confusing meanings. Adornment harkens back to the medieval sense of manuscript adornment or illumination -- attaching pictures and marginal comments to texts, as the scribal monk at right is doing.

Currently MorphAdorner provides methods for adorning text with standard spellings, parts of speech and lemmata. MorphAdorner also provides facilities for tokenizing text, recognizing sentence boundaries, and extracting names and places. You can find out more about each of these facilities, and see online demonstrations of each, by consulting the documentation section of this web site.

MorphAdorner underwent continuous development in tandem with three projects: WordHoard, Monk, and Virtual Orthographic Standardization and Part of Speech Tagging (VOSPOS), as well as smaller scale faculty research projects at Northwestern University. All three projects are now complete. While MorphAdorner has been used in these projects, it is actually a separate project in its own right.

MorphAdorner saw heavy use in the Monk project. The Monk project sought to adorn a large number of English language texts from the early Modern English period to the start of the twentieth century. The total number of adorned words was about 151.5 million words by project end in April 2009.

Starting in October 2012 we initiated a new MorphAdorner v2.0 project which sought to improve MorphAdorner's processing of several Text Creation Partnership corpora beyond what was attempted during the Monk project. These corpora included the Early English Books Online (EEBO) corpus, the Eighteenth Century Collections Online (ECCO), and the Evans Early American Imprint Collection. You can read more about MorphAdorner's processing of TCP texts.

We improved MorphAdorner's integration with Abbot. Abbot converts dissimilar collections of XML texts into a common interoperable form. Abbot was designed and implemented by Brian L. Pytlik Zillig, Stephen Ramsay, Martin Mueller, and Frank Smutniak.

Our goal in the Abbot and EEBO MorphAdorner collaboration is to turn the TCP texts into the foundation for a "Book of English," defined as:

  • a large, growing, collaboratively curated, and public domain corpus of written English since its earliest modern form
  • with full bibliographical detail
  • and light but consistent structural and linguistic annotation.

We also replaced the makeshift demonstration servlets of MorphAdorner v1.0 with a separate MorphAdorner Server. The MorphAdorner Server allows access to many MorphAdorner facilities through HTTP-based web services. These services can be accessed using simple web forms or by any programming language which supports web forms and HTTP. The online examples of MorphAdorner facilities on this web site use JavaScript to access the services provided by a local instance of the MorphAdorner Server.

Please see the modification history for a general overview of the changes from MorphAdorner v1 to v2.

Home
 
Announcements and News
 
Documentation
 
Download MorphAdorner
 
Glossary
 
Helpful References
 
Licenses
 
Server
 
Talks
 
Tech Talk