Poets that lasting marble seek,
Must carve in Latin or in Greek.
We write in sand, our language grows,
And like the tide, our work o'erflows.

-- Edmund Waller



Northwestern
MorphAdorner
    INFORMATION TECHNOLOGY  
    MorphAdorner Site Map  
MorphAdorner > Documentation > Verticalizing An Adorned Text
 
Home
 
Announcements and News
 
Download MorphAdorner
 
Documentation
 
Licenses
 
Glossary
 
Helpful References
 
Tech Talk
 

Language Recognizer
 
Lemmatizer
 
Lexicon Lookup
 
Name Recognizer
 
Parser
 
Part of Speech Tagger
 
Pluralizer
 
Sentence Splitter
 
Spelling Standardizer
 
Text Segmenter
 
Verb Conjugator
 
Word Tokenizer
 
  Verticalizing An Adorned Text
 
 

XMLToTab converts MorphAdorner XML output to tab-separated tabular form.

Usage:

xmltotab input.xml output.tab

where

  • input.xml is the input MorphAdorned XML file.
  • output.tab is the output tab-separated values file.

The attribute values for each <w> element in the input XML file are extracted and output to a tab-separated values text file. An output line contains the following information corresponding to a single word <w> element.

  1. The work ID.
  2. The permanent word ID.
  3. The corrected original spelling.
  4. The corrected original spelling reversed.
  5. The standard spelling.
  6. The lemma.
  7. The part of speech.
  8. An XPath-like path to this word. The leading work ID and trailing word number are removed from the path.
  9. The end of sentence flag. 1 if this word ends a sentence, 0 otherwise.
  10. The previous word's original spelling.
  11. The next word's original spelling.
  12. Up to 80 characters of text preceding the word in the text.
  13. Up to 80 characters of text following the word in the text.

This tabular representation of an adorned XML text is useful for data checking purposes. The morphological attribute values for each word <w> element appear as columns. The 80 characters (or so) of text on either side of the word allows you to focus on particular part of speech tags and pinpoint errors from the automatic adornment process. The tab separated values may also be used to construct spreadsheets or databases of the individual word information.

 

Information Technology | Academic Technologies | Scholarly Technologies 2East Resource Center |
Northwestern Home | Calendar: Plan-It Purple | Sites A-Z | Search
Academic Technologies  NU Library 2East  1970 Campus Drive  Evanston, IL 60208
E-mail: pib@northwestern.edu
Last updated Sun Mar 15 05:52:42 2009   World Wide Web Disclaimer and University Policy Statements   © 2007, 2008 Northwestern University