Northwestern University Information Technology
FindTeiTextLanguage determines the language(s) in which a TEI text is written.
findteitextlanguage output.tab input1.xml input2.xml ...
The output file is a tab-delimited utf-8 text file containing the following fields, in order.
Texts which do not have at least three recognizable languages will have missing language names set to blank with a score of zero.
Language recognizer scores range from 0.0 (not a match) to 1.0 (perfect match). Documents for which the second and third languages achieve non-negligible scores indicate potential problems for processing unless the words in the secondary language are marked up in the TEI document.
|Announcements and News
|Announcements and news about changes to MorphAdorner
|Documentation for using MorphAdorner
|Downloading and installing the MorphAdorner client and server software
|Glossary of MorphAdorner terms
|Natural language processing references
|Licenses for MorphAdorner and Associated Software
|Online examples of MorphAdorner Server facilities.
|Slides from talks about MorphAdorner.
|Technical information for programmers using MorphAdorner
Academic Technologies and Research Services,
NU Library 2East, 1970 Campus Drive Evanston, IL 60208. |