public class NGramExtractor
extends java.lang.Object
Modifier and Type | Field and Description |
---|---|
protected java.util.Map<java.lang.String,java.lang.Integer> |
nGramCounts
The list of ngrams and associated counts.
|
(package private) int |
nGramSize
Number of words forming an ngram.
|
protected int |
numberOfNGrams
Total number of ngrams.
|
(package private) int |
windowSize
Window size within which to search for ngrams.
|
Constructor and Description |
---|
NGramExtractor(int nGramSize,
int windowSize)
Create NGrams.
|
Modifier and Type | Method and Description |
---|---|
void |
addWords(java.util.List<java.lang.String> wordList)
Add words from list words.
|
void |
addWords(java.lang.String[] words)
Add words from string array of words.
|
int |
getNGramCount(java.lang.String ngram)
Return count for a specific ngram.
|
java.util.Map<java.lang.String,java.lang.Integer> |
getNGramMap()
Return NGram map.
|
java.lang.String[] |
getNGrams()
Return NGrams.
|
int |
getNumberOfNGrams()
Returns the total number of ngrams.
|
int |
getNumberOfUniqueNGrams()
Returns the number of unique ngrams.
|
void |
mergeNGramExtractor(NGramExtractor extractor)
Merge ngrams from another NGramExtractor.
|
static java.lang.String[] |
splitNGramIntoWords(java.lang.String ngram)
Returns the individual words comprising an ngram.
|
int nGramSize
int windowSize
protected java.util.Map<java.lang.String,java.lang.Integer> nGramCounts
Key=ngram string
Value=Integer(count)
The ngram string is two or more words with a tab character ("\t") separating the words.
protected int numberOfNGrams
public NGramExtractor(int nGramSize, int windowSize)
nGramSize
- The number of words forming an ngram.windowSize
- The window size (number of words)
within which to construct ngrams.
Example: nGramSize=2, windowSize=3, text="a quick brown fox".
The first window is "a quick brown". The ngrams are "a quick", "a brown", and "quick brown".
The second window is "quick brown fox." The ngrams are "quick brown", "quick fox", and "brown fox".
public void addWords(java.lang.String[] words)
words
- The string array with the words.public void addWords(java.util.List<java.lang.String> wordList)
wordList
- The list with the words.public void mergeNGramExtractor(NGramExtractor extractor)
extractor
- Merge ngrams from another extractor.public int getNGramCount(java.lang.String ngram)
ngram
- The ngram whose count is desired.public java.lang.String[] getNGrams()
public java.util.Map<java.lang.String,java.lang.Integer> getNGramMap()
public int getNumberOfNGrams()
public int getNumberOfUniqueNGrams()
public static java.lang.String[] splitNGramIntoWords(java.lang.String ngram)
ngram
- The ngram to parse.