Poets that lasting marble seek,
Must carve in Latin or in Greek.
We write in sand, our language grows,
And like the tide, our work o'erflows.

-- Edmund Waller



Northwestern
MorphAdorner
    INFORMATION TECHNOLOGY  
    MorphAdorner Site Map  
MorphAdorner > Text Segmenter
 
Home
 
Announcements and News
 
Download MorphAdorner
 
Documentation
 
Licenses
 
Glossary
 
Helpful References
 
Tech Talk
 

Language Recognizer
 
Lemmatizer
 
Lexicon Lookup
 
Name Recognizer
 
Parser
 
Part of Speech Tagger
 
Pluralizer
 
Sentence Splitter
 
Spelling Standardizer
 
Text Segmenter
 
Verb Conjugator
 
Word Tokenizer
 
  Text Segmenter
 
 

Text Segmentation methods try to break up a text into thematically meaningful segments. MorphAdorner implements two linear segmentation methods which use measures of lexical cohesion to produce segments: Marti Hearst's TextTiler and Freddy Choi's C99. Both of these try to find those portions of a text in which the vocabulary changes from one subtopic to another. These change points mark the boundaries of the text segments.

Segmentation methods have been traditionally been applied to non-fiction discursive texts. We are interested in investigating whether segmentation methods illuminate the thematic structure of a wider span of genres in both fiction and non-fiction.

You can try MorphAdorner's linear text segmenters online.

 

Information Technology | Academic Technologies | Scholarly Technologies 2East Resource Center |
Northwestern Home | Calendar: Plan-It Purple | Sites A-Z | Search
Academic Technologies  NU Library 2East  1970 Campus Drive  Evanston, IL 60208
E-mail: pib@northwestern.edu
Last updated Wed Mar 25 16:31:04 2009   World Wide Web Disclaimer and University Policy Statements   © 2007, 2008 Northwestern University