NU IT
Northwestern University Information Technology
MorphAdorner Northwestern
 
Relemmatizing an Adorned File

Relemmatize updates lemmata and standard spellings in MorphAdorned XML files.

Usage:

relemmatize lexicon.lex spellingmap.tab spellingsbywordclass.txt standardspellings.txt outputdirectory adornedinput.xml adornedinput2.xml ...

where

  • lexicon.lex -- Input MorphAdorner lexicon file.
  • spellingmap.tab -- Two column tab-separated spelling map file. First column is a variant spelling and the second column is the standard spelling.
  • spellingsbywordclass.tab -- A spelling map file which breaks down the variant to standard spellings by word class.
  • standardspellings.txt -- File containing standard known spellings.
  • outputdirectory -- Output directory for updated MorphAdorner adorned XML files.
  • adornedinput*.xml -- MorphAdorner adorned XML output files.

The MorphAdorner release provides two specialized versions of the relemmatize command. To relemmatize using the Early Modern English data:

relemmatizeeme outputdirectory adornedinput.xml adornedinput2.xml ...

To relemmatize using the Nineteenth Century Fiction data:

relemmatizencf outputdirectory adornedinput.xml adornedinput2.xml ...

The lemmata and standard spellings for each adorned word in the input XML files are updated with the most current values. The updated XML files are written to the outputdirectory directory.

The source code for Relemmatize provides an example of reading an adorned XML file and modifying it using a SAX filter.

Home
 
Announcements and News
 
Documentation
 
Download MorphAdorner
 
Glossary
 
Helpful References
 
Licenses
 
Server
 
Talks
 
Tech Talk