MultiwordUnitData (MorphAdorner)

java.lang.Object
- edu.northwestern.at.morphadorner.corpuslinguistics.multiwordunits.MultiwordUnitData

```
public class MultiwordUnitData
extends java.lang.Object
```
Multiword unit data.
Holds data on counts and association measure values for one multiword unit.

Field Summary

Fields
Modifier and Type	Field and Description
`protected double`	`dice`
`protected NGramExtractor[]`	`extractors`
`protected java.lang.String`	`leftSuccessorPattern`
`protected double`	`logLikelihood`
`protected java.lang.String`	`mwu`
`protected int`	`mwuCount`
`protected int`	`mwuLength`
`protected double`	`phiSquared`
`protected java.lang.String`	`rightSuccessorPattern`
`protected double`	`scp`
`protected double`	`si`
`protected double`	`sigLogLikelihood`
`protected int`	`totalWordCount`
`protected java.util.Map<java.lang.String,java.lang.Integer>`	`wordCountMap`
`protected int[]`	`wordCounts`
`protected java.lang.String[]`	`words`

Constructor Summary

Constructors
Constructor and Description
`MultiwordUnitData(java.lang.String mwu, java.util.Map<java.lang.String,java.lang.Integer> wordCountMap, int totalWordCount, NGramExtractor[] extractors)`

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`calculateAssociationMeasures()` Calculate the association measures.
`double`	`freq(java.lang.String[] words, int i1, int i2)` Calculate the frequency for a portion of a set of words.
`protected double`	`getAvp()` Get the fair dispersion point normalization.
`protected double`	`getAvp2()` Get the fair dispersion point normalization.
`double`	`getAvx()` Calculate fair probability for the left hand side of a pseudo-bigram.
`double`	`getAvy()` Calculate fair probability for the right hand side of a pseudo-bigram.
`double`	`getDice()` Return the Dice coefficient.
`double`	`getLogLikelihood()` Return log likelihood.
`java.lang.String`	`getMWUText()` Get the multiword unit text.
`int`	`getMWUTextCount()` Get the count for this multiword unit text.
`int`	`getMWUTextLength()` Get the number of words in this multiword unit.
`double`	`getPhiSquared()` Return phi squared.
`double`	`getSCP()` Return the symmetric conditional probability.
`double`	`getSI()` Return the specific mutual information.
`double`	`getSigLogLikelihood()` Return significance of log likelihood.
`int`	`getWordCount(java.lang.String word)` Get count for a specific word from the count map.
`int[]`	`getWordCounts()` Get the count for each word in this multiword unit.
`java.lang.String[]`	`getWords()` Get the words in this multiword unit.
`java.lang.String`	`leftAntecedent()` Get the left antecedent of the current multiword unit.
`java.lang.String[]`	`leftSuccessors()` Get the left successors of the current multiword unit.
`double`	`prob(java.lang.String[] words, int i1, int i2)` Calculate the probability for a portion of a set of words.
`java.lang.String`	`rightAntecedent()` Get the right antecedent of the current multiword unit.
`java.lang.String[]`	`rightSuccessors()` Get the right successors of the current multiword unit.
`java.lang.String[]`	`successors()` Get the successors of the current multiword unit.
`java.lang.String`	`toString()` Return mwu as a displayable string.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

- Field Detail
  - mwu
```
protected java.lang.String mwu
```
  - mwuCount
```
protected int mwuCount
```
  - mwuLength
```
protected int mwuLength
```
  - words
```
protected java.lang.String[] words
```
  - wordCounts
```
protected int[] wordCounts
```
  - dice
```
protected double dice
```
  - logLikelihood
```
protected double logLikelihood
```
  - phiSquared
```
protected double phiSquared
```
  - scp
```
protected double scp
```
  - si
```
protected double si
```
  - sigLogLikelihood
```
protected double sigLogLikelihood
```
  - extractors
```
protected NGramExtractor[] extractors
```
  - leftSuccessorPattern
```
protected java.lang.String leftSuccessorPattern
```
  - rightSuccessorPattern
```
protected java.lang.String rightSuccessorPattern
```
  - totalWordCount
```
protected int totalWordCount
```
  - wordCountMap
```
protected java.util.Map<java.lang.String,java.lang.Integer> wordCountMap
```
- Constructor Detail
  - MultiwordUnitData
```
public MultiwordUnitData(java.lang.String mwu,
                 java.util.Map<java.lang.String,java.lang.Integer> wordCountMap,
                 int totalWordCount,
                 NGramExtractor[] extractors)
```
- Method Detail
  - getMWUText
```
public java.lang.String getMWUText()
```
    Get the multiword unit text.
    
    Returns:
    The multiword unit text.
  - getMWUTextCount
```
public int getMWUTextCount()
```
    Get the count for this multiword unit text.
    
    Returns:
    Count of appearances of this multiword unit.
  - getMWUTextLength
```
public int getMWUTextLength()
```
    Get the number of words in this multiword unit.
    
    Returns:
    Number of words in this multiword unit.
  - getWords
```
public java.lang.String[] getWords()
```
    Get the words in this multiword unit.
    
    Returns:
    Words in this multiword unit.
  - getWordCounts
```
public int[] getWordCounts()
```
    Get the count for each word in this multiword unit.
    
    Returns:
    Count for each word in this multiword unit.
  - leftAntecedent
```
public java.lang.String leftAntecedent()
```
    Get the left antecedent of the current multiword unit.
    
    Returns:
    The left antecedent as a string.
  - rightAntecedent
```
public java.lang.String rightAntecedent()
```
    Get the right antecedent of the current multiword unit.
    
    Returns:
    The right antecedent as a string.
  - successors
```
public java.lang.String[] successors()
```
    Get the successors of the current multiword unit.
    
    Returns:
    The successors as an array of strings.
  - leftSuccessors
```
public java.lang.String[] leftSuccessors()
```
    Get the left successors of the current multiword unit.
    
    Returns:
    The left successors as an array of strings.
  - rightSuccessors
```
public java.lang.String[] rightSuccessors()
```
    Get the right successors of the current multiword unit.
    
    Returns:
    The right successors as an array of strings.
  - getAvx
```
public double getAvx()
```
    Calculate fair probability for the left hand side of a pseudo-bigram.
    
    Returns:
    Fair probability for left hand side of pseudo-bigram.
  - getAvy
```
public double getAvy()
```
    Calculate fair probability for the right hand side of a pseudo-bigram.
    
    Returns:
    Fair probability for right hand side of pseudo-bigram.
  - getAvp
```
protected double getAvp()
```
    Get the fair dispersion point normalization.
    
    Returns:
    Fair dispersion point normalization.
  - getAvp2
```
protected double getAvp2()
```
    Get the fair dispersion point normalization.
    
    Returns:
    Fair dispersion point normalization.
  - calculateAssociationMeasures
```
public void calculateAssociationMeasures()
```
    Calculate the association measures.
  - prob
```
public double prob(java.lang.String[] words,
          int i1,
          int i2)
```
    Calculate the probability for a portion of a set of words.
    
    Parameters:
    words - The words.
    i1 - Starting index.
    i2 - Ending index.
    
    Returns:
    Probability from word counts.
    We use the maximum likelihood estimate of the probability, which is just the number of times the word appears divided by the number of words. For ngrams, we divide the number of times the ngram appears by the total number of ngrams containing the same number of words.
  - freq
```
public double freq(java.lang.String[] words,
          int i1,
          int i2)
```
    Calculate the frequency for a portion of a set of words.
    
    Parameters:
    words - The words.
    i1 - Starting index.
    i2 - Ending index.
    
    Returns:
    Frequency from ngram frequencies.
  - getDice
```
public double getDice()
```
    Return the Dice coefficient.
    
    Returns:
    The Dice coefficient.
  - getLogLikelihood
```
public double getLogLikelihood()
```
    Return log likelihood.
    
    Returns:
    log likelihood.
  - getPhiSquared
```
public double getPhiSquared()
```
    Return phi squared.
    
    Returns:
    phi squared.
  - getSCP
```
public double getSCP()
```
    Return the symmetric conditional probability.
    
    Returns:
    The symmetric conditional probability.
  - getSI
```
public double getSI()
```
    Return the specific mutual information.
    
    Returns:
    The specific mutual information.
  - getSigLogLikelihood
```
public double getSigLogLikelihood()
```
    Return significance of log likelihood.
    
    Returns:
    significance of log likelihood.
  - getWordCount
```
public int getWordCount(java.lang.String word)
```
    Get count for a specific word from the count map.
    
    Parameters:
    word - The word text.
    
    Returns:
    The count for the specified word. 0 if the word does not occur.
  - toString
```
public java.lang.String toString()
```
    Return mwu as a displayable string.
    
    Overrides:
    
    toString in class java.lang.Object
    
    Returns:
    The mwu as a displayable string.

Class MultiwordUnitData

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

mwu

mwuCount

mwuLength

words

wordCounts

dice

logLikelihood

phiSquared

scp

si

sigLogLikelihood

extractors

leftSuccessorPattern

rightSuccessorPattern

totalWordCount

wordCountMap

Constructor Detail

MultiwordUnitData

Method Detail

getMWUText

getMWUTextCount

getMWUTextLength

getWords

getWordCounts

leftAntecedent

rightAntecedent

successors

leftSuccessors

rightSuccessors

getAvx

getAvy

getAvp

getAvp2

calculateAssociationMeasures

prob

freq

getDice

getLogLikelihood

getPhiSquared

getSCP

getSI

getSigLogLikelihood

getWordCount

toString