Poets that lasting marble seek,
Must carve in Latin or in Greek.
We write in sand, our language grows,
And like the tide, our work o'erflows.

-- Edmund Waller



Northwestern
MorphAdorner
    INFORMATION TECHNOLOGY  
    MorphAdorner Site Map  
MorphAdorner > Documentation > Comparing Adorned Files
 
Home
 
Announcements and News
 
Download MorphAdorner
 
Documentation
 
Licenses
 
Glossary
 
Helpful References
 
Tech Talk
 

Language Recognizer
 
Lemmatizer
 
Lexicon Lookup
 
Name Recognizer
 
Parser
 
Part of Speech Tagger
 
Pluralizer
 
Sentence Splitter
 
Spelling Standardizer
 
Text Segmenter
 
Verb Conjugator
 
Word Tokenizer
 
  Comparing Adorned Files
 
 

TagDiff compares two columnar files containing spellings and part of speech tags.

Usage:

tagdiff input1.tab postagcol1 input2.tab postagcol2

where

  • input1.tab is an input tab-separated file containing spellings in the first column and parts of speech in the second column. Usually this is a reference (training) file in which the part of speech assignments are known to be correct.
  • postagcol1 is the column number (starting at 1) which contains the part of speech tags in the first file.
  • input2.tab is an input tab-separated file containing spellings in the first column and parts of speech in the second column. Usually this is a file produced by MorphAdorner or some other part of speech tagger.
  • postagcol2 is the column number (starting at 1) which contains the part of speech tags in the second file.

The two files must have the exact same number of lines and the same exact spellings, in order, in column one. However, blank lines are ignored in both files.

TagDiff writes a report to the standard system output file tallying the numbers and types of differences in the part of speech assignments provided by each file. If the first file is a reference file, this allows you to see how well the part of speech tagger reproduced the reference tagging. A good part of speech tagger for English normally gets at least 96% of the tags correct.

 

Information Technology | Academic Technologies | Scholarly Technologies 2East Resource Center |
Northwestern Home | Calendar: Plan-It Purple | Sites A-Z | Search
Academic Technologies  NU Library 2East  1970 Campus Drive  Evanston, IL 60208
E-mail: pib@northwestern.edu
Last updated Sun Mar 15 05:52:32 2009   World Wide Web Disclaimer and University Policy Statements   © 2007, 2008 Northwestern University