public class EEBOPostTokenizer extends AbstractPostTokenizer implements PostTokenizer
This post tokenizer processes tokens extracted from EEBO corpus texts. It removes soft hyphens and regularizes some EEBO specific tagging. This can be used for either original format EEBO texts or EEBO texts in TEIAnalytics format.
logger
Constructor and Description |
---|
EEBOPostTokenizer()
Create an EEBO PostTokenizer.
|
Modifier and Type | Method and Description |
---|---|
java.lang.String[] |
postTokenize(java.lang.String token)
Process a token after tokenization.
|
getLogger, setLogger
close
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
close
public java.lang.String[] postTokenize(java.lang.String token)
postTokenize
in interface PostTokenizer
postTokenize
in class AbstractPostTokenizer
token
- The token to process after tokenization.The minimally processed token is typically results in an original spelling.
The maximally processed token typically results in a partially or completely standardized spelling.
These may be identical.