NU IT
Northwestern University Information Technology
MorphAdorner Northwestern
 
Counting Words In An Adorned Text

CountAdornedWords tabulates counts of adorned words from XMLToTab output files.

Usage:

countadornedwords output.tab input.tab input2.tab ...

where

  • output.tab is the output tab-separated count file.
  • input*.tab are the input tabbed files produced by XMLToTab.

The output file is a tab-delimited utf-8 encoded text file containing the following fields, in order.

  1. Short work name, formed from input file name by stripping the path and file extension.
  2. The corrected original spelling.
  3. The standard spelling.
  4. The parts of speech.
  5. The lemmata.
  6. The count of the tuple (work name, corrected spelling, standard spelling, parts of speech, lemmata).

This output provides a "bag of words" for each input text which can then be input to a database or spreadsheet for further analysis.

Home
 
Announcements and News
 
Documentation
 
Download MorphAdorner
 
Glossary
 
Helpful References
 
Licenses
 
Server
 
Talks
 
Tech Talk