edu.northwestern.at.utils.corpuslinguistics.sentencesplitter
Class DefaultSentenceSplitter

java.lang.Object
  extended by edu.northwestern.at.utils.IsCloseableObject
      extended by edu.northwestern.at.utils.corpuslinguistics.sentencesplitter.AbstractSentenceSplitter
          extended by edu.northwestern.at.utils.corpuslinguistics.sentencesplitter.ICU4JBreakIteratorSentenceSplitter
              extended by edu.northwestern.at.utils.corpuslinguistics.sentencesplitter.DefaultSentenceSplitter
All Implemented Interfaces:
SentenceSplitter, IsCloseable, UsesLogger

public class DefaultSentenceSplitter
extends ICU4JBreakIteratorSentenceSplitter
implements SentenceSplitter

Splits text into sentences.

Uses the built-in Java BreakIterator class to identify candidate sentences. Several heuristics are used to correct the sentence identification produced by BreakIterator when a sentence potentially ends with an abbreviation or a bracket character (right parenthesis, right bracket, or right brace).


Field Summary
 
Fields inherited from class edu.northwestern.at.utils.corpuslinguistics.sentencesplitter.AbstractSentenceSplitter
disallowedSentenceStarters, logger, names, partOfSpeechGuesser, sentenceSplitterIterator, wordTokenizer
 
Constructor Summary
DefaultSentenceSplitter()
           
 
Method Summary
 
Methods inherited from class edu.northwestern.at.utils.corpuslinguistics.sentencesplitter.AbstractSentenceSplitter
addSentence, addSentenceBad, extractSentences, extractSentences, findSentenceOffsets, fixUpSentence, getLogger, isNoun, isPronoun, isProperNoun, isVerb, quoteOnlySentence, setLogger, setPartOfSpeechGuesser, setSentenceSplitterIterator, splitSentenceWordList, verbSeen
 
Methods inherited from class edu.northwestern.at.utils.IsCloseableObject
close
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 
Methods inherited from interface edu.northwestern.at.utils.corpuslinguistics.sentencesplitter.SentenceSplitter
extractSentences, extractSentences, findSentenceOffsets, setPartOfSpeechGuesser, setSentenceSplitterIterator
 
Methods inherited from interface edu.northwestern.at.utils.IsCloseable
close
 

Constructor Detail

DefaultSentenceSplitter

public DefaultSentenceSplitter()