public class CompareStringCounts extends java.lang.Object
java edu.northwestern.at.morphadorner.tools.comparestringcounts.CompareStringCounts analysis.tab reference.tab
analysis.tab -- Input tab-separated file of strings and counts
for an analysis text.
reference.tab -- Input tab-separated file of strings and counts for a reference text.
The analysis.tab and reference.tab files contain strings and counts of those strings compiled from two texts or corpora. Both files contain two tab-separated columns. The first column is a string. The second column contains the count of the number of times that string occurred in the associated text.
The output contains seven tab-separated columns, sorted in descending order by log-likelihood value. One line of output appears for each string in the analysis text.
These results are written to the standard output file which can be redirected to another file. A brief summary of the analysis is written to the standard error file. Errors in the input files are also written to the standard error file.
|Modifier and Type||Class and Description|
ScoredString modified to sort results from highest to lowest.
|Constructor and Description|
Supervises comparing string counts in two files.
|Modifier and Type||Method and Description|
Displays results of frequency analysis in a sorted table.
Display brief program usage.
Frequency comparison of analysis and reference works for a word.
public CompareStringCounts(java.lang.String args)
args- Command line arguments.
public static void main(java.lang.String args)
public static void displayUsage()
public static double doFreq(java.lang.String stringToAnalyze, int analysisCount, int analysisTotalCount, int refCount, int refTotalCount)
stringToAnalyze- The word to analyze.
analysisCount- Count of word in analysis text.
analysisTotalCount- Total number of words in analysis text.
refCount- Count of collocate in reference text.
refTotalCount- Total number of words in reference text.
The entries in the results array are as follows.
(0) Count of string occurrence in analysis text.
(1) String occurrence in analysis text as parts per 10,000.
(2) Count of string occurrence in reference text.
(3) String occurrence in reference text as parts per 10,000.
(4) Dunning's Log-likelihood value.
public static void displayResults(java.util.Map<CompareStringCounts.ReverseScoredString,double> results)
results- The map of results to display.