public class AdornedToTCF04
extends java.lang.Object
AdornedToTCF04 converts one or more adorned files to the Text Corpus Format (TCF) v0.4 used by the CLARIN-D project.
Usage:
adornedtotcf04 outputdirectory adorned1.xml adorned2.xml ...
where
The Text Corpus Format (TCF) is used by the European CLARIN-D project to allow interchange of corpora among different web-based services. TCF is an XML-based format which consists of a plain text representation of a work along with a series of annotation layers.
AdornedToTCF04 converts one or more MorphAdorned TEI XML files to TCF format. The text (without tags) is extracted and output, along with the following annotation layers:
Modifier and Type | Class and Description |
---|---|
static class |
AdornedToTCF04.MyToken |
Modifier and Type | Field and Description |
---|---|
protected static int |
currentDocNumber
Current document.
|
protected static int |
docsToProcess
Number of documents to process.
|
protected static int |
INITPARAMS
# params before input file specs.
|
protected static java.lang.String |
inputDirectory
Input directory.
|
protected static java.lang.String |
outputDirectory
Output directory.
|
protected static java.io.PrintStream |
outputFileStream
Output file stream.
|
protected static java.io.PrintStream |
printStream
Wrapper for printStream to allow utf-8 output.
|
Constructor and Description |
---|
AdornedToTCF04() |
Modifier and Type | Method and Description |
---|---|
protected static boolean |
initialize(java.lang.String[] args)
Initialize.
|
static void |
main(java.lang.String[] args)
Main program.
|
protected static int |
processFiles(java.lang.String[] args)
Process files.
|
protected static void |
processOneFile(java.lang.String xmlFileName)
Process one file.
|
protected static void |
terminate(int filesProcessed,
long processingTime)
Terminate.
|
protected static int docsToProcess
protected static int currentDocNumber
protected static java.lang.String inputDirectory
protected static java.lang.String outputDirectory
protected static java.io.PrintStream outputFileStream
protected static java.io.PrintStream printStream
protected static final int INITPARAMS
public static void main(java.lang.String[] args)
args
- Program parameters.protected static boolean initialize(java.lang.String[] args) throws java.lang.Exception
java.lang.Exception
protected static void processOneFile(java.lang.String xmlFileName)
xmlFileName
- Adorned XML file name to reformat for CWB.protected static int processFiles(java.lang.String[] args) throws java.lang.Exception
java.lang.Exception
protected static void terminate(int filesProcessed, long processingTime)
filesProcessed
- Number of files processed.processingTime
- Processing time in seconds.