|
FixXMLQuotes attempts to convert straight double quotes
(Ascii/Unicode 34) into "curly" left and right double quotes
(Unicode 8220 and 8221 respectively). It also attempts to convert straight
single quotes (Ascii/Unicode 39) into "curly" left and right single quotes
(Unicode 8216 and 8217 respectively) and to distinguish these from the use
of the single quote as an apostrophe. FixXMLQuotes makes mistakes,
so its output should be corrected manually. FixXMLQuotes accepts XML files
in TEI format as input.
Usage:
fixxmlquotes softtags.txt jumptags.txt outputdirectory input1.xml input2.xml ...
where
- softtags.txt specifies a text file containing list of soft
XML tags, one per line. A sample is included as part of the
MorphAdorner distribution.
- jumptags.txt specifies a text file containing list of
jump XML tags, one per line. A sample is included as part of the
MorphAdorner distribution.
- outputdirectory specifies the output directory to receive
xml files with quote marks fixed.
- input*.xml specifies the input TEI XML files.
For each of the input XML files, FixXMLQuotes attempts to correct the
quotes and writes a corrected XML file of the same name in the
specified output directory.
The companion FixQuotes program provides
the same approach to correcting quote marks, but for plain text
files instead of XML files.
Usage:
fixquotes input.txt output.txt
where
- input.txt specifies the input text file with quote marks
to correct.
- output.txt specifies the output text file with quote marks
fixed.
At best fixxmlquotes and fixquote
correct 90% of the quotes. The remainder need to be corrected
manually.
|