public class EccoPreTokenizer extends AbstractPreTokenizer implements PreTokenizer
Modifier and Type | Field and Description |
---|---|
protected static PatternReplacer |
doubleBackTicksReplacer
Double back-ticks.
|
protected static java.lang.String |
EccoAlwaysSeparators |
protected static PatternReplacer |
singleBackTicksReplacer
Single back-tick followed by a capital letter.
|
protected static PatternReplacer |
wordOrSpanGapReplacer
Word or span gap.
|
alwaysSeparators, alwaysSeparatorsReplacer, asterisks, commaSeparator, commaSeparatorReplacer, hyphens, logger, periods
Constructor and Description |
---|
EccoPreTokenizer()
Create an Ecco pretokenizer.
|
Modifier and Type | Method and Description |
---|---|
java.lang.String |
pretokenize(java.lang.String line)
Prepare text for tokenization.
|
getLogger, setLogger
close
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
close
protected static final java.lang.String EccoAlwaysSeparators
protected static final PatternReplacer wordOrSpanGapReplacer
protected static final PatternReplacer doubleBackTicksReplacer
protected static final PatternReplacer singleBackTicksReplacer
public java.lang.String pretokenize(java.lang.String line)
pretokenize
in interface PreTokenizer
pretokenize
in class AbstractPreTokenizer
line
- The text to prepare for tokenization,