NU IT
Northwestern University Information Technology
MorphAdorner Northwestern
 
Extracting text from a TEI XML file

ExtractTEIText applies an XSL transformation to an input TEI XML file to extract the text from the body of the file.

Usage:

extractteitext input.xml output.xml

where

input.xml The input TEI XML file.
output.txt The output file containing the text extracted from the input TEI file.

The XSLT transformation used to extract the text is defined in the tei2text.xsl file in the xslt directory of the MorphAdorner release. This transformation works well for unadorned TEI files, not so well for adorned files. You can use the Unadorn utility to unadorn an adorned file before extracting the text.

Home
 
Announcements and News
 
Documentation
 
Download MorphAdorner
 
Glossary
 
Helpful References
 
Licenses
 
Server
 
Talks
 
Tech Talk