NU IT
Northwestern University Information Technology
MorphAdorner Northwestern
 
Gap Filler

The text of many older works may not be clearly readable because of faded print, ink blotches, foxing, or other degradations of the printed source. Transcribers mark these unreadable sections in digital text copies using special characters or tag sequences. In TEI, the <gap> tag serves to mark sections of a text which cannot be transcribed because of problems in reading the original source.

It may be useful to try to repair individual damaged words by examining which letters appear in the same positions as unreadable letters across a set of related texts. In essense this is the same as trying to find the missing letters in words in crossword puzzles. In some cases there is only a single plausible reconstruction for a damaged word. More often there are several possible reconstructions.

MorphAdorner implements a "gap filler" algorithm which looks at all the words which do not contain gaps in a given lexicon and tries to find potential matches for a word containing individual letter gaps. MorphAdorner uses a trie structure to hold all the words without gaps, which supports fast searches for words contain unknown letters.

You can try MorphAdorner's gap filler online.

Home
 
Announcements and News
 
Documentation
 
Download MorphAdorner
 
Glossary
 
Helpful References
 
Licenses
 
Server
 
Talks
 
Tech Talk