NU IT
Northwestern University Information Technology
MorphAdorner Northwestern
 
Counting Affixes In An Adorned Text

CountAffixes counts affixes (suffixes and prefixes) of adorned words by processing MorphAdorned XML output.

Usage:

countaffixes input.xml prefixes.tab suffixes.tab

where

  • input.xml -- input XML file produced as output by MorphAdorner.
  • prefixes.tab -- output tab-separated prefixes file described below.
  • suffixes.tab -- output tab-separated suffixes file described below.

Both the prefixes.tab and suffixes.tab output files contain two tab-separated columns. The first column is a prefix or suffix string, respectively, and the second column contains the count of the number of times that prefix or suffix occurred in the unique words in the input.xml file.

Why do we care about affixes? Affixes of one kind or another are a good proxy for etymologies -- at least in English. In some ways they are better, because the affix is part of the writer's or reader's repertoire in a way in which knowledge of etymologies is not. The distribution of word etymologies -- or affixes -- offers one way of studying an author's style.

For example, R. Harald Baayen argues that 'ation' is a distinctive suffix and is characteristic of the Latinate and Johnsonian streak in Jane Austen's writing. A study of affix distributions for other authors may reveal similar interesting patterns.

Home
 
Announcements and News
 
Documentation
 
Download MorphAdorner
 
Glossary
 
Helpful References
 
Licenses
 
Server
 
Talks
 
Tech Talk