|
MorphAdorner is a Java command-line program which acts as a pipeline manager
for processes performing morphological adornment of words in a text.
We use the term "adornment" in preference to terms such as "annotation" or
"tagging" which carry too many alternative and confusing meanings.
Adornment harkens back to the medieval sense of manuscript adornment or
illumination -- attaching pictures and marginal comments to texts.
Currently MorphAdorner provides methods for adorning text with
standard spellings, parts of speech and lemmata. MorphAdorner also provides
facilities for tokenizing text, recognizing sentence boundaries, and
extracting names and places. You can find out more about each of these
facilities, and see online demontrations of each, by choosing an
item from the menu to the left.
MorphAdorner has undergone continuous development in tandem with
three projects:
WordHoard,
Monk,
and
Virtual Orthographic Standardization and Part of
Speech Tagging (VOSPOS), as well as smaller scale faculty research
projects at Northwestern University.
All three projects are now complete.
While MorphAdorner has been used in these projects, it is
actually a separate project in its own right.
MorphAdorner saw its heaviest use in the Monk project.
The Monk project sought to adorn a large number of English language texts
from the early Modern English period to the start of the twentieth century.
The total number of adorned words was about 151.5 million words
by project end in April 2009.
Our efforts to adorn English texts covering a period of over four
hundred years must deal with the fact that the English language has changed
significantly even since the start of the
early modern period around 1470 A.D. The great vowel
sound shift was only about half complete at this time. Spelling
was not at all standardized. Early printed
texts reflect the differences in pronunciation. In 1475 William Caxton
published (in Bruges) the first book printed in English,
Recuyell of the Historyes of Troye. That
short title reveals the orthographic variety that persisted until
the late eighteenth century.
Today there remain differences among British, American, and Canadian
spellings. Unlike the early modern period, these differences are
reasonably regular and generally easy to handle.
|