Poets that lasting marble seek,
Must carve in Latin or in Greek.
We write in sand, our language grows,
And like the tide, our work o'erflows.

-- Edmund Waller



Northwestern
MorphAdorner
    INFORMATION TECHNOLOGY  
    MorphAdorner Site Map  
MorphAdorner > Home
 
Home
 
Announcements and News
 
Download MorphAdorner
 
Documentation
 
Licenses
 
Glossary
 
Helpful References
 
Tech Talk
 

Language Recognizer
 
Lemmatizer
 
Lexicon Lookup
 
Name Recognizer
 
Parser
 
Part of Speech Tagger
 
Pluralizer
 
Sentence Splitter
 
Spelling Standardizer
 
Text Segmenter
 
Verb Conjugator
 
Word Tokenizer
 
  Welcome
 
 

MorphAdorner is a Java command-line program which acts as a pipeline manager for processes performing morphological adornment of words in a text. We use the term "adornment" in preference to terms such as "annotation" or "tagging" which carry too many alternative and confusing meanings. Adornment harkens back to the medieval sense of manuscript adornment or illumination -- attaching pictures and marginal comments to texts.

Currently MorphAdorner provides methods for adorning text with standard spellings, parts of speech and lemmata. MorphAdorner also provides facilities for tokenizing text, recognizing sentence boundaries, and extracting names and places. You can find out more about each of these facilities, and see online demontrations of each, by choosing an item from the menu to the left.

MorphAdorner has undergone continuous development in tandem with three projects: WordHoard, Monk, and Virtual Orthographic Standardization and Part of Speech Tagging (VOSPOS), as well as smaller scale faculty research projects at Northwestern University. All three projects are now complete. While MorphAdorner has been used in these projects, it is actually a separate project in its own right.

MorphAdorner saw its heaviest use in the Monk project. The Monk project sought to adorn a large number of English language texts from the early Modern English period to the start of the twentieth century. The total number of adorned words was about 151.5 million words by project end in April 2009.

Our efforts to adorn English texts covering a period of over four hundred years must deal with the fact that the English language has changed significantly even since the start of the early modern period around 1470 A.D. The great vowel sound shift was only about half complete at this time. Spelling was not at all standardized. Early printed texts reflect the differences in pronunciation. In 1475 William Caxton published (in Bruges) the first book printed in English, Recuyell of the Historyes of Troye. That short title reveals the orthographic variety that persisted until the late eighteenth century.

Today there remain differences among British, American, and Canadian spellings. Unlike the early modern period, these differences are reasonably regular and generally easy to handle.

 

Information Technology | Academic Technologies | Scholarly Technologies 2East Resource Center |
Northwestern Home | Calendar: Plan-It Purple | Sites A-Z | Search
Academic Technologies  NU Library 2East  1970 Campus Drive  Evanston, IL 60208
E-mail: pib@northwestern.edu
Last updated Sun Mar 15 05:52:44 2009   World Wide Web Disclaimer and University Policy Statements   © 2007, 2008 Northwestern University