NU
IT
Northwestern University Information Technology |
MorphAdorner V2.0 | Site Map |
The Text Creation Partnership (TCP) transcriptions do not record line breaks in the printed originals. They do, however record "soft" hyphens where a word straddles two lines. The pipe character or vertical bar is used to mark such line breaks as in "wind|ing".
Word breaks at line endings are not always marked with a hyphen in the printed originals. Transcribers were asked to supply missing soft hyphens with a '+' sign. Sometimes they did, sometimes they didn't. Unmarked word breaks, especially in marginal notes, are a very common feature of the TCP texts.
The soft hyphens of the SGML transcriptions of the printed texts are treated according to the following protocol after conversion to TEI XML format.
This replacement algorithm is implemented by a sequence of utilities after all the XML files are tokenized. This is necessary to get the complete list of tokens for determining how often a word appears with or without a real hyphen in the corpus. These utilities are applied only for TCP texts and are not particularly useful in general.
edu.northwestern.at.morphadorner.tools.tcp.CountDividedWords
.
edu.northwestern.at.morphadorner.tools.tcp.FindSoftHyphens
then
edu.northwestern.at.morphadorner.tools.tcp.ExtractSoftHyphens
.
edu.northwestern.at.morphadorner.tools.tcp.FixWordBreaks
.
Home | |
Welcome | |
Announcements and News | |
Announcements and news about changes to MorphAdorner | |
Documentation | |
Documentation for using MorphAdorner | |
Download MorphAdorner | |
Downloading and installing the MorphAdorner client and server software | |
Glossary | |
Glossary of MorphAdorner terms | |
Helpful References | |
Natural language processing references | |
Licenses | |
Licenses for MorphAdorner and Associated Software | |
Server | |
Online examples of MorphAdorner Server facilities. | |
Talks | |
Slides from talks about MorphAdorner. | |
Tech Talk | |
Technical information for programmers using MorphAdorner |
Academic Technologies and Research Services,
NU Library 2East, 1970 Campus Drive Evanston, IL 60208. |
Contact Us.
|