public class AdornedToSimpleTEIP5
extends java.lang.Object
AdornedToSimpleTEIP5 converts a base-level MorphAdorner file to a more TEI P5-like format.
Usage:
adornedtosimpleteip5 outputdirectory [usereg|usechoice] interpgrp.xml goodfiles.txt badfiles.txt adorned1.xml adorned2.xml ...
where
Modifier and Type | Field and Description |
---|---|
protected static java.lang.String |
badWorksFileName
File name of file to hold names of works which fail conversion.
|
protected static java.util.Set<java.lang.String> |
badWorksSet
File names of works containing errors.
|
protected static int |
currentDocNumber
Current document.
|
protected static int |
docsToProcess
Number of documents to process.
|
protected static boolean |
forceAna
Force ana=#pos output for part of speech.
|
protected static int |
gapCount
Gap count.
|
protected static java.lang.String |
goodWorksFileName
File name of file to hold names of works for which conversion
succeeds.
|
protected static java.util.Set<java.lang.String> |
goodWorksSet
File names of works converted successfully.
|
protected static boolean |
haveInterpGrp
True if interGrp text is not empty.
|
protected static int |
INITPARAMS
# params before input file specs.
|
protected static java.lang.String |
interpGrpXMLText
XML text of interGrp section defining part of speech tags.
|
protected static java.lang.String |
lastID
Last word ID processed.
|
protected static java.lang.String |
outputDirectory
Output directory.
|
protected static java.io.PrintStream |
printStream
Wrapper for printStream to allow utf-8 output.
|
protected static int |
sentenceCount
Sentence count.
|
protected static org.jdom2.Namespace |
teiNamespace
TEI name space.
|
protected static boolean |
useReg
Use reg= instead of
|
Constructor and Description |
---|
AdornedToSimpleTEIP5() |
Modifier and Type | Method and Description |
---|---|
protected static void |
addWordID(org.jdom2.Element wordElement,
java.util.Set<java.lang.String> wordIDs)
Save word ID in set.
|
protected static org.jdom2.Element |
cleanWElement(org.jdom2.Element element)
Clean word element.
|
protected static org.jdom2.Element |
createElement(java.lang.String name)
Create an element.
|
protected static java.lang.String |
displayElement(org.jdom2.Element element)
Display element.
|
protected static org.jdom2.Element |
generateChoice(org.jdom2.Element element,
java.lang.String wordText,
java.lang.String regText)
Generate choice element for word structure.
|
protected static void |
handleGap(org.jdom2.Content content,
boolean inSplit,
java.util.List<org.jdom2.Element> splitWordElements)
Handle gap tag.
|
protected static void |
handleSup(org.jdom2.Content content)
Handle sup tag.
|
protected static int |
handleW(java.util.List<org.jdom2.Content> contents,
int index)
Handle w tag.
|
protected static boolean |
initialize(java.lang.String[] args)
Initialize.
|
static void |
main(java.lang.String[] args)
Main program.
|
protected static int |
processFiles(java.lang.String[] args)
Process files.
|
protected static void |
processOneFile(java.lang.String xmlFileName)
Process one file.
|
protected static void |
replaceSupWithHi(org.jdom2.Element element)
Replace sup with hi.
|
protected static void |
terminate(int filesProcessed,
long processingTime)
Terminate.
|
protected static int docsToProcess
protected static int currentDocNumber
protected static java.lang.String interpGrpXMLText
protected static boolean haveInterpGrp
protected static boolean forceAna
protected static boolean useReg
protected static java.lang.String outputDirectory
protected static java.io.PrintStream printStream
protected static final int INITPARAMS
protected static java.lang.String lastID
protected static int gapCount
protected static int sentenceCount
protected static java.lang.String badWorksFileName
protected static java.util.Set<java.lang.String> badWorksSet
protected static java.lang.String goodWorksFileName
protected static java.util.Set<java.lang.String> goodWorksSet
protected static org.jdom2.Namespace teiNamespace
public static void main(java.lang.String[] args)
args
- Program parameters.protected static boolean initialize(java.lang.String[] args) throws java.lang.Exception
java.lang.Exception
protected static void processOneFile(java.lang.String xmlFileName)
xmlFileName
- Adorned XML file name to reformat for Xaira.protected static void addWordID(org.jdom2.Element wordElement, java.util.Set<java.lang.String> wordIDs)
wordElement
- Word element.wordIDs
- Set of word IDs.protected static int handleW(java.util.List<org.jdom2.Content> contents, int index)
contents
- List of content nodes.index
- Index of "w" node to process.protected static org.jdom2.Element cleanWElement(org.jdom2.Element element)
element
- w element.protected static org.jdom2.Element createElement(java.lang.String name)
name
- Element name.protected static void handleGap(org.jdom2.Content content, boolean inSplit, java.util.List<org.jdom2.Element> splitWordElements)
content
- Node content.inSplit
- Processing split word.splitWordElements
- Split word elements.protected static void handleSup(org.jdom2.Content content)
content
- Node content.protected static void replaceSupWithHi(org.jdom2.Element element)
element
- sup element to convert and replace with hi.protected static java.lang.String displayElement(org.jdom2.Element element)
element
- Element to display.protected static org.jdom2.Element generateChoice(org.jdom2.Element element, java.lang.String wordText, java.lang.String regText)
element
- Parent element for choice structure.wordText
- Original word text. May be null.regText
- Standard spelling.Emit a choice structure of the form:
<w...> <choice> <orig>original spelling</orig> or <orig> <seg ...> <seg ...> ... </orig> <reg>standard spelling</reg> </choice> </w>
protected static int processFiles(java.lang.String[] args)
protected static void terminate(int filesProcessed, long processingTime)
filesProcessed
- Number of files processed.processingTime
- Processing time in seconds.