See: Description
Class | Description |
---|---|
FindTEITextLanguage |
Find languages for TEI-encoded text.
|
FindTEITextLanguage.DocData |
Hold language recognition results for one document.
|
Determines the language(s) in which a TEI text is written.
Usage:
java edu.northwestern.at.morphadorner.tools.findteitextlanguage output.tab input1.xml input2.xml ...
output.tab -- output tab-separated values file described below.
input*.xml -- input TEI XML files whose language is to be found.
The output file is a tab-delimited utf-8 text file containing the following fields, in order.
Texts which do not have at least three recognizable languages will have missing language names set to blank with a score of zero.
Language recognizer scores range from 0.0 (not a match) to 1.0 (perfect match).