|
TagDiff
compares two columnar files containing spellings and part of
speech tags.
Usage:
tagdiff input1.tab postagcol1 input2.tab postagcol2
where
- input1.tab is an input tab-separated
file containing spellings in the first column and parts of speech
in the second column. Usually this is a reference (training) file
in which the part of speech assignments are known to be correct.
- postagcol1 is the column number
(starting at 1)
which contains the part of speech tags in the first file.
- input2.tab is an input tab-separated file
containing spellings in the first column and parts of speech
in the second column. Usually this is a file produced by
MorphAdorner or some other part of speech tagger.
- postagcol2 is the column number
(starting at 1)
which contains the part of speech tags in the second file.
The two files must have the exact same number of lines and
the same exact spellings, in order, in column one. However,
blank lines are ignored in both files.
TagDiff writes a report to the standard system output file
tallying the numbers and types of differences in the part of
speech assignments provided by each file. If the first file is
a reference file, this allows you to see how well the part of
speech tagger reproduced the reference tagging. A good part
of speech tagger for English normally gets at least 96% of the
tags correct.
|