public class MultiwordUnitData
extends java.lang.Object
Holds data on counts and association measure values for one multiword unit.
Modifier and Type | Field and Description |
---|---|
protected double |
dice |
protected NGramExtractor[] |
extractors |
protected java.lang.String |
leftSuccessorPattern |
protected double |
logLikelihood |
protected java.lang.String |
mwu |
protected int |
mwuCount |
protected int |
mwuLength |
protected double |
phiSquared |
protected java.lang.String |
rightSuccessorPattern |
protected double |
scp |
protected double |
si |
protected double |
sigLogLikelihood |
protected int |
totalWordCount |
protected java.util.Map<java.lang.String,java.lang.Integer> |
wordCountMap |
protected int[] |
wordCounts |
protected java.lang.String[] |
words |
Constructor and Description |
---|
MultiwordUnitData(java.lang.String mwu,
java.util.Map<java.lang.String,java.lang.Integer> wordCountMap,
int totalWordCount,
NGramExtractor[] extractors) |
Modifier and Type | Method and Description |
---|---|
void |
calculateAssociationMeasures()
Calculate the association measures.
|
double |
freq(java.lang.String[] words,
int i1,
int i2)
Calculate the frequency for a portion of a set of words.
|
protected double |
getAvp()
Get the fair dispersion point normalization.
|
protected double |
getAvp2()
Get the fair dispersion point normalization.
|
double |
getAvx()
Calculate fair probability for the left hand side of a pseudo-bigram.
|
double |
getAvy()
Calculate fair probability for the right hand side of a pseudo-bigram.
|
double |
getDice()
Return the Dice coefficient.
|
double |
getLogLikelihood()
Return log likelihood.
|
java.lang.String |
getMWUText()
Get the multiword unit text.
|
int |
getMWUTextCount()
Get the count for this multiword unit text.
|
int |
getMWUTextLength()
Get the number of words in this multiword unit.
|
double |
getPhiSquared()
Return phi squared.
|
double |
getSCP()
Return the symmetric conditional probability.
|
double |
getSI()
Return the specific mutual information.
|
double |
getSigLogLikelihood()
Return significance of log likelihood.
|
int |
getWordCount(java.lang.String word)
Get count for a specific word from the count map.
|
int[] |
getWordCounts()
Get the count for each word in this multiword unit.
|
java.lang.String[] |
getWords()
Get the words in this multiword unit.
|
java.lang.String |
leftAntecedent()
Get the left antecedent of the current multiword unit.
|
java.lang.String[] |
leftSuccessors()
Get the left successors of the current multiword unit.
|
double |
prob(java.lang.String[] words,
int i1,
int i2)
Calculate the probability for a portion of a set of words.
|
java.lang.String |
rightAntecedent()
Get the right antecedent of the current multiword unit.
|
java.lang.String[] |
rightSuccessors()
Get the right successors of the current multiword unit.
|
java.lang.String[] |
successors()
Get the successors of the current multiword unit.
|
java.lang.String |
toString()
Return mwu as a displayable string.
|
protected java.lang.String mwu
protected int mwuCount
protected int mwuLength
protected java.lang.String[] words
protected int[] wordCounts
protected double dice
protected double logLikelihood
protected double phiSquared
protected double scp
protected double si
protected double sigLogLikelihood
protected NGramExtractor[] extractors
protected java.lang.String leftSuccessorPattern
protected java.lang.String rightSuccessorPattern
protected int totalWordCount
protected java.util.Map<java.lang.String,java.lang.Integer> wordCountMap
public MultiwordUnitData(java.lang.String mwu, java.util.Map<java.lang.String,java.lang.Integer> wordCountMap, int totalWordCount, NGramExtractor[] extractors)
public java.lang.String getMWUText()
public int getMWUTextCount()
public int getMWUTextLength()
public java.lang.String[] getWords()
public int[] getWordCounts()
public java.lang.String leftAntecedent()
public java.lang.String rightAntecedent()
public java.lang.String[] successors()
public java.lang.String[] leftSuccessors()
public java.lang.String[] rightSuccessors()
public double getAvx()
public double getAvy()
protected double getAvp()
protected double getAvp2()
public void calculateAssociationMeasures()
public double prob(java.lang.String[] words, int i1, int i2)
words
- The words.i1
- Starting index.i2
- Ending index.We use the maximum likelihood estimate of the probability, which is just the number of times the word appears divided by the number of words. For ngrams, we divide the number of times the ngram appears by the total number of ngrams containing the same number of words.
public double freq(java.lang.String[] words, int i1, int i2)
words
- The words.i1
- Starting index.i2
- Ending index.public double getDice()
public double getLogLikelihood()
public double getPhiSquared()
public double getSCP()
public double getSI()
public double getSigLogLikelihood()
public int getWordCount(java.lang.String word)
word
- The word text.public java.lang.String toString()
toString
in class java.lang.Object