|
CountAffixes
counts affixes (suffixes and prefixes) of adorned words by
processing MorphAdorned XML output.
Usage:
countaffixes input.xml prefixes.tab suffixes.tab
where
- input.xml -- input XML file produced as output
by MorphAdorner.
- prefixes.tab -- output tab-separated prefixes file
described below.
- suffixes.tab -- output tab-separated suffixes file
described below.
Both the prefixes.tab and suffixes.tab output files
contain two tab-separated columns. The first column is a prefix or
suffix string, respectively, and the second column contains the count
of the number of times that prefix or suffix occurred in the unique words
in the input.xml file.
Why do we care about affixes?
Affixes of one kind or another are a
good proxy for etymologies -- at least in English. In some ways they
are better, because the affix is part of the writer's or reader's repertoire
in a way in which knowledge of etymologies is not.
The distribution of word etymologies -- or affixes -- offers one
way of studying an author's style.
For example, R. Harald Baayen argues that 'ation' is a distinctive suffix
and is characteristic of the Latinate and Johnsonian streak in Jane Austen's
writing. A study of affix distributions for other authors may reveal
similar interesting patterns.
|