public interface SentenceSplitter
Modifier and Type | Method and Description |
---|---|
java.util.List<java.util.List<java.lang.String>> |
extractSentences(java.lang.String text)
Break text into sentences and tokens.
|
java.util.List<java.util.List<java.lang.String>> |
extractSentences(java.lang.String text,
WordTokenizer tokenizer)
Break text into sentences and tokens.
|
int[] |
findSentenceOffsets(java.lang.String text,
java.util.List<java.util.List<java.lang.String>> sentences)
Find starting offsets of sentences extracted from a text.
|
void |
setAbbreviations(Abbreviations abbreviations)
Set abbreviations.
|
void |
setPartOfSpeechGuesser(PartOfSpeechGuesser partOfSpeechGuesser)
Set part of speech guesser.
|
void |
setSentenceSplitterIterator(SentenceSplitterIterator sentenceSplitterIterator)
Set sentence splitter iterator.
|
void setPartOfSpeechGuesser(PartOfSpeechGuesser partOfSpeechGuesser)
partOfSpeechGuesser
- Part of speech guesser.
A sentence splitter may use part of speech information to disambiguate end-of-sentence boundary conditions. The part of speech guesser provides access to the lexicons and guessing algorithms for determining the possible parts of speech for a word without performing a full part of speech tagging operation.
void setSentenceSplitterIterator(SentenceSplitterIterator sentenceSplitterIterator)
sentenceSplitterIterator
- Sentence splitter iterator.void setAbbreviations(Abbreviations abbreviations)
abbreviations
- Abbreviations.java.util.List<java.util.List<java.lang.String>> extractSentences(java.lang.String text, WordTokenizer tokenizer)
text
- Text to break into sentences and tokens.tokenizer
- Tokenizer to use for breaking sentences
into words.Word tokens may be words, numbers, punctuation, etc.
java.util.List<java.util.List<java.lang.String>> extractSentences(java.lang.String text)
text
- Text to break into sentences and tokens.Word tokens may be words, numbers, punctuation, etc. The default word tokenizer is used.
int[] findSentenceOffsets(java.lang.String text, java.util.List<java.util.List<java.lang.String>> sentences)
text
- Text from which sentences were
extracted.sentences
- List of sentences (each a list of
words) extracted from text.
N.B. If the sentences aren't from
the specified text, the resulting
offsets will be meaningless.