Counting Affixes In An Adorned Text

CountAffixes counts affixes (suffixes and prefixes) of adorned words by processing MorphAdorned XML output.


countaffixes input.xml


  • input.xml -- input XML file produced as output by MorphAdorner.
  • -- output tab-separated prefixes file described below.
  • -- output tab-separated suffixes file described below.

Both the and output files contain two tab-separated columns. The first column is a prefix or suffix string, respectively, and the second column contains the count of the number of times that prefix or suffix occurred in the unique words in the input.xml file.

Why do we care about affixes? Affixes of one kind or another are a good proxy for etymologies -- at least in English. In some ways they are better, because the affix is part of the writer's or reader's repertoire in a way in which knowledge of etymologies is not. The distribution of word etymologies -- or affixes -- offers one way of studying an author's style.

For example, R. Harald Baayen argues that 'ation' is a distinctive suffix and is characteristic of the Latinate and Johnsonian streak in Jane Austen's writing. A study of affix distributions for other authors may reveal similar interesting patterns.

