NU IT
Northwestern University Information Technology
MorphAdorner Northwestern
 
Converting an adorned file to TCF format

AdornedToTCF04 converts one or more adorned files to the Text Corpus Format (TCF) v0.4 used by the CLARIN-D project.

Usage:

adornedtotcf04 outputdirectory adorned1.xml adorned2.xml ...

where

  • outputdirectory specifies the output directory to receive the TCF v0.4 formatted files.
  • adorned1.xml adorned2.xml ... specifies the input MorphAdorned XML files from which to produce the TCF v0.4 versions.

The Text Corpus Format (TCF) is used by the European CLARIN-D project to allow interchange of corpora among different web-based services. TCF is an XML-based format which consists of a plain text representation of a work along with a series of annotation layers.

AdornedToTCF04 converts one or more MorphAdorned TEI XML files to TCF format. The text (without tags) is extracted and output, along with the following annotation layers:

  • Tokens (using the MorphAdorner word IDs)
  • Lemmata
  • Part of speech tags
  • Sentences
Home
 
Announcements and News
 
Documentation
 
Download MorphAdorner
 
Glossary
 
Helpful References
 
Licenses
 
Server
 
Talks
 
Tech Talk