Poets that lasting marble seek,
Must carve in Latin or in Greek.
We write in sand, our language grows,
And like the tide, our work o'erflows.

-- Edmund Waller



Northwestern
MorphAdorner
    INFORMATION TECHNOLOGY  
    MorphAdorner Site Map  
MorphAdorner > Documentation > Relemmatizing an Adorned File
 
Home
 
Announcements and News
 
Download MorphAdorner
 
Documentation
 
Licenses
 
Glossary
 
Helpful References
 
Tech Talk
 

Language Recognizer
 
Lemmatizer
 
Lexicon Lookup
 
Name Recognizer
 
Parser
 
Part of Speech Tagger
 
Pluralizer
 
Sentence Splitter
 
Spelling Standardizer
 
Text Segmenter
 
Verb Conjugator
 
Word Tokenizer
 
  Relemmatizing an Adorned File
 
 

Relemmatize updates lemmata and standard spellings in MorphAdorned XML files.

Usage:

relemmatize lexicon.lex spellingmap.tab spellingsbywordclass.txt standardspellings.txt outputdirectory adornedinput.xml adornedinput2.xml ...

where

  • lexicon.lex -- Input MorphAdorner lexicon file.
  • spellingmap.tab -- Two column tab-separated spelling map file. First column is a variant spelling and the second column is the standard spelling.
  • spellingsbywordclass.tab -- A spelling map file which breaks down the variant to standard spellings by word class.
  • standardspellings.txt -- File containing standard known spellings.
  • outputdirectory -- Output directory for updated MorphAdorner adorned XML files.
  • adornedinput*.xml -- MorphAdorner adorned XML output files.

The MorphAdorner release provides two specialized versions of the relemmatize command. To relemmatize using the Early Modern English data:

relemmatizeeme outputdirectory adornedinput.xml adornedinput2.xml ...

To relemmatize using the Nineteenth Century Fiction data:

relemmatizencf outputdirectory adornedinput.xml adornedinput2.xml ...

The lemmata and standard spellings for each adorned word in the input XML files are updated with the most current values. The updated XML files are written to the outputdirectory directory.

The source code for Relemmatize provides an example of reading an adorned XML file and modifying it using a SAX filter.

 

Information Technology | Academic Technologies | Scholarly Technologies 2East Resource Center |
Northwestern Home | Calendar: Plan-It Purple | Sites A-Z | Search
Academic Technologies  NU Library 2East  1970 Campus Drive  Evanston, IL 60208
E-mail: pib@northwestern.edu
Last updated Mon Mar 30 14:10:10 2009   World Wide Web Disclaimer and University Policy Statements   © 2007, 2008 Northwestern University