public class ICU4JBreakIteratorSentenceSplitter extends AbstractSentenceSplitter implements SentenceSplitter
Uses a the ICU4J BreakIterator to identify candidate sentences. Several heuristics are used to correct the initial sentence identification.
abbreviations, disallowedSentenceStarters, logger, names, partOfSpeechGuesser, sentenceSplitterIterator, wordTokenizer
Constructor and Description |
---|
ICU4JBreakIteratorSentenceSplitter()
Create regular expression sentence extractor.
|
ICU4JBreakIteratorSentenceSplitter(java.util.Locale locale)
Create regular expression sentence extractor for locale.
|
addSentence, addSentenceBad, extractSentences, extractSentences, findSentenceOffsets, fixUpSentence, getLogger, isClosingPunctuationOnly, isNoun, isPronoun, isProperNoun, isVerb, quoteOnlySentence, setAbbreviations, setLogger, setPartOfSpeechGuesser, setSentenceSplitterIterator, splitSentenceWordList, verbSeen
close
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
extractSentences, extractSentences, findSentenceOffsets, setAbbreviations, setPartOfSpeechGuesser, setSentenceSplitterIterator
close
public ICU4JBreakIteratorSentenceSplitter()
public ICU4JBreakIteratorSentenceSplitter(java.util.Locale locale)