Package edu.northwestern.at.morphadorner.tools.comparestringcounts

Compare string counts in two files using Dunning's log-likelihood.

See: Description

Package edu.northwestern.at.morphadorner.tools.comparestringcounts Description

Compare string counts in two files using Dunning's log-likelihood.

Usage:

java edu.northwestern.at.morphadorner.tools.comparestringcounts.CompareStringCounts analysis.tab reference.tab

analysis.tab -- Input tab-separated file of strings and counts for an analysis text.
reference.tab -- Input tab-separated file of strings and counts for a reference text.

The analysis.tab and reference.tab files contain strings and counts of those strings compiled from two texts or corpora. Both files contain two tab-separated columns. The first column is a string. The second column contains the count of the number of times that string occurred in the associated text.

The output contains seven tab-separated columns, sorted in descending order by log-likelihood value. One line of output appears for each string in the analysis text.

  1. The first column contains the string. This may be a spelling, a lemma, a part of speech, a spelling bigram, or any other string of interest.
  2. The second column contains a "+" when the string is overused in the analysis text with respect to the reference text, a "-" when the string is underused, and a blank when the string is used the same amount in both texts.
  3. The third column contains Dunning's log-likelihood value.
  4. The fourth column shows the relative frequency of occurrence of the string in the analysis text as fractional parts per ten thousand.
  5. The fifth column shows the relative frequency of occurrence of the string in the reference text as fractional parts per ten thousand.
  6. The sixth column shows the number of times the string occurred in the analysis text.
  7. The seventh column shows the number of times the string occurred in the reference text.

These results are written to the standard output file which can be redirected to another file. A brief summary of the analysis is written to the standard error file.