|
Relemmatize
updates lemmata and standard spellings in MorphAdorned XML files.
Usage:
relemmatize lexicon.lex spellingmap.tab spellingsbywordclass.txt standardspellings.txt outputdirectory adornedinput.xml adornedinput2.xml ...
where
- lexicon.lex -- Input MorphAdorner lexicon file.
- spellingmap.tab -- Two column tab-separated spelling map file.
First column is a variant spelling and the second column is
the standard spelling.
- spellingsbywordclass.tab -- A spelling map file which breaks
down the variant to standard spellings by word class.
- standardspellings.txt -- File containing standard known spellings.
- outputdirectory -- Output directory for updated MorphAdorner adorned XML files.
- adornedinput*.xml -- MorphAdorner adorned XML output files.
The MorphAdorner release provides two specialized versions of the
relemmatize command. To relemmatize using the Early Modern English
data:
relemmatizeeme outputdirectory adornedinput.xml adornedinput2.xml ...
To relemmatize using the Nineteenth Century Fiction data:
relemmatizencf outputdirectory adornedinput.xml adornedinput2.xml ...
The lemmata and standard spellings for each adorned word in the input XML
files are updated with the most current values. The updated XML files
are written to the outputdirectory directory.
The source code for Relemmatize provides an example of reading
an adorned XML file and modifying it using a SAX filter.
|