public class CountAdornedWords
extends java.lang.Object
Usage:
java edu.northwestern.at.morphadorner.tools.countadornedwords.CountAdornedWords output.tab input.tab input2.tab ...
outputdir -- output directory to receive tab-separated values files
described below, one for each input file.
input*.tab -- input tabbed files produced as output by XMLToTab.
Each output file is a tab-delimited utf-8 text file containing the following fields, in order.
Modifier and Type | Field and Description |
---|---|
protected static java.util.Map<AdornedWordCountInfo,java.lang.Integer> |
adornedWordInfoMap
Adorned word info map.
|
protected static int |
currentFileNumber
Current document.
|
protected static int |
DIVTYPE |
protected static int |
filesToProcess
Number of documents to process.
|
protected static int |
INITPARAMS
# params before input file specs.
|
protected static int |
LEMMA |
protected static java.lang.String |
outputDirectory
Output directory.
|
protected static int |
PATH |
protected static int |
POS |
protected static int |
SPELLING |
protected static int |
STANDARD |
protected static TEITagClassifier |
tagClassifier
TEI tag classifier.
|
protected static int |
totalWords
Total words found.
|
protected static int |
uniqueWords
Count of document tags.
|
protected static int |
WORKID
Tabular data input fields.
|
Constructor and Description |
---|
CountAdornedWords() |
Modifier and Type | Method and Description |
---|---|
static void |
incrementWordCountMap(AdornedWordCountInfo adornedWordInfo)
Updates counts for an adorned word in a set.
|
protected static boolean |
initialize(java.lang.String[] args)
Initialize.
|
static void |
main(java.lang.String[] args)
Main program.
|
protected static int |
processFiles(java.lang.String[] args)
Process files.
|
protected static void |
processOneFile(java.lang.String tabFileName)
Process one file.
|
static void |
saveWordInfo(java.io.File adornedWordInfoFile,
java.lang.String encoding)
Save adorned word count information to a file.
|
protected static void |
terminate(int filesProcessed,
long processingTime)
Terminate.
|
protected static final int WORKID
protected static final int SPELLING
protected static final int STANDARD
protected static final int LEMMA
protected static final int POS
protected static final int PATH
protected static final int DIVTYPE
protected static java.util.Map<AdornedWordCountInfo,java.lang.Integer> adornedWordInfoMap
protected static java.lang.String outputDirectory
protected static final int INITPARAMS
protected static int filesToProcess
protected static int currentFileNumber
protected static int totalWords
protected static int uniqueWords
protected static TEITagClassifier tagClassifier
public static void main(java.lang.String[] args)
args
- Program parameters.protected static boolean initialize(java.lang.String[] args) throws java.lang.Exception
java.lang.Exception
protected static void processOneFile(java.lang.String tabFileName)
tabFileName
- XML input file name.protected static int processFiles(java.lang.String[] args) throws java.lang.Exception
args
- Program arguments.java.lang.Exception
protected static void terminate(int filesProcessed, long processingTime)
filesProcessed
- Number of files processed.processingTime
- Processing time in seconds.public static void incrementWordCountMap(AdornedWordCountInfo adornedWordInfo)
adornedWordInfo
- The adorned word information.public static void saveWordInfo(java.io.File adornedWordInfoFile, java.lang.String encoding) throws java.io.IOException, java.io.FileNotFoundException
adornedWordInfoFile
- Output file name.encoding
- Character encoding for the file.java.io.IOException
- If output file has error.java.io.FileNotFoundException