|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
public interface SentenceSplitter
Interface for splitting text into sentences.
| Method Summary | |
|---|---|
java.util.List<java.util.List<java.lang.String>> |
extractSentences(java.lang.String text)
Break text into sentences and tokens. |
java.util.List<java.util.List<java.lang.String>> |
extractSentences(java.lang.String text,
WordTokenizer tokenizer)
Break text into sentences and tokens. |
int[] |
findSentenceOffsets(java.lang.String text,
java.util.List<java.util.List<java.lang.String>> sentences)
Find starting offsets of sentences extracted from a text. |
void |
setPartOfSpeechGuesser(PartOfSpeechGuesser partOfSpeechGuesser)
Set part of speech guesser. |
void |
setSentenceSplitterIterator(SentenceSplitterIterator sentenceSplitterIterator)
Set sentence splitter iterator. |
| Method Detail |
|---|
void setPartOfSpeechGuesser(PartOfSpeechGuesser partOfSpeechGuesser)
partOfSpeechGuesser - Part of speech guesser.
A sentence splitter may use part of speech information to disambiguate end-of-sentence boundary conditions. The part of speech guesser provides access to the lexicons and guessing algorithms for determining the possible parts of speech for a word without performing a full part of speech tagging operation.
void setSentenceSplitterIterator(SentenceSplitterIterator sentenceSplitterIterator)
sentenceSplitterIterator - Sentence splitter iterator.
java.util.List<java.util.List<java.lang.String>> extractSentences(java.lang.String text,
WordTokenizer tokenizer)
text - Text to break into sentences and tokens.tokenizer - Tokenizer to use for breaking sentences
into words.
Word tokens may be words, numbers, punctuation, etc.
java.util.List<java.util.List<java.lang.String>> extractSentences(java.lang.String text)
text - Text to break into sentences and tokens.
Word tokens may be words, numbers, punctuation, etc. The default word tokenizer is used.
int[] findSentenceOffsets(java.lang.String text,
java.util.List<java.util.List<java.lang.String>> sentences)
text - Text from which sentences were
extracted.sentences - List of sentences (each a list of
words) extracted from text.
N.B. If the sentences aren't from
the specified text, the resulting
offsets will be meaningless.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||