Poets that lasting marble seek,
Must carve in Latin or in Greek.
We write in sand, our language grows,
And like the tide, our work o'erflows.

-- Edmund Waller



Northwestern
MorphAdorner
    INFORMATION TECHNOLOGY  
    MorphAdorner Site Map  
MorphAdorner > Documentation > Counting Words In An Adorned Text
 
Home
 
Announcements and News
 
Download MorphAdorner
 
Documentation
 
Licenses
 
Glossary
 
Helpful References
 
Tech Talk
 

Language Recognizer
 
Lemmatizer
 
Lexicon Lookup
 
Name Recognizer
 
Parser
 
Part of Speech Tagger
 
Pluralizer
 
Sentence Splitter
 
Spelling Standardizer
 
Text Segmenter
 
Verb Conjugator
 
Word Tokenizer
 
  Counting Words In An Adorned Text
 
 

CountAdornedWords tabulates counts of adorned words from XMLToTab output files.

Usage:

countadornedwords output.tab input.tab input2.tab ...

where

  • output.tab is the output tab-separated count file.
  • input*.tab are the input tabbed files produced by XMLToTab.

The output file is a tab-delimited utf-8 encoded text file containing the following fields, in order.

  1. Short work name, formed from input file name by stripping the path and file extension.
  2. The corrected original spelling.
  3. The standard spelling.
  4. The parts of speech.
  5. The lemmata.
  6. The count of the tuple (work name, corrected spelling, standard spelling, parts of speech, lemmata).

This output provides a "bag of words" for each input text which can then be input to a database or spreadsheet for further analysis.

 

Information Technology | Academic Technologies | Scholarly Technologies 2East Resource Center |
Northwestern Home | Calendar: Plan-It Purple | Sites A-Z | Search
Academic Technologies  NU Library 2East  1970 Campus Drive  Evanston, IL 60208
E-mail: pib@northwestern.edu
Last updated Sun Mar 15 05:52:32 2009   World Wide Web Disclaimer and University Policy Statements   © 2007, 2008 Northwestern University