NU IT
Northwestern University Information Technology
MorphAdorner Northwestern
 
Correcting Quote Marks

FixXMLQuotes attempts to convert straight double quotes (Ascii/Unicode 34) into "curly" left and right double quotes (Unicode 8220 and 8221 respectively). It also attempts to convert straight single quotes (Ascii/Unicode 39) into "curly" left and right single quotes (Unicode 8216 and 8217 respectively) and to distinguish these from the use of the single quote as an apostrophe. FixXMLQuotes makes mistakes, so its output should be corrected manually. FixXMLQuotes accepts XML files in TEI format as input.

Usage:

fixxmlquotes softtags.txt jumptags.txt outputdirectory input1.xml input2.xml ...

where

  • softtags.txt specifies a text file containing list of soft XML tags, one per line. A sample is included as part of the MorphAdorner distribution.
  • jumptags.txt specifies a text file containing list of jump XML tags, one per line. A sample is included as part of the MorphAdorner distribution.
  • outputdirectory specifies the output directory to receive xml files with quote marks fixed.
  • input*.xml specifies the input TEI XML files.

For each of the input XML files, FixXMLQuotes attempts to correct the quotes and writes a corrected XML file of the same name in the specified output directory.

The companion FixQuotes program provides the same approach to correcting quote marks, but for plain text files instead of XML files.

Usage:

fixquotes input.txt output.txt

where

  • input.txt specifies the input text file with quote marks to correct.
  • output.txt specifies the output text file with quote marks fixed.

At best fixxmlquotes and fixquote correct 90% of the quotes. The remainder need to be corrected manually.

Home
 
Announcements and News
 
Documentation
 
Download MorphAdorner
 
Glossary
 
Helpful References
 
Licenses
 
Server
 
Talks
 
Tech Talk