TransitionMatrix (MorphAdorner)

java.lang.Object
- edu.northwestern.at.utils.IsCloseableObject
- - edu.northwestern.at.morphadorner.corpuslinguistics.postagger.transitionmatrix.TransitionMatrix

All Implemented Interfaces:

UsesLogger

Direct Known Subclasses:

DefaultTransitionMatrix, PennTreebankTransitionMatrix
```
public class TransitionMatrix
extends IsCloseableObject
implements UsesLogger
```
Probability transition matrix.
Holds the unigram, bigram, and trigram counts and probabilities.

Call calculateProbabilities() to calculate tag transition probabilities. Weights for the ngrams are computed using deleted interpolation.

Field Summary

Fields
Modifier and Type	Field and Description
`protected static int`	`BIGRAM`
`protected Map2D<java.lang.String,java.lang.String,java.lang.Integer>`	`bigramCountMap`
`protected Map2D<java.lang.String,java.lang.String,java.lang.Double>`	`bigramProbMap`
`protected double[]`	`bigramWeights` Bigram weights from deleted interpolation.
`protected static boolean`	`debug` True if debugging output enabled.
`protected boolean`	`haveProbabilities` True if probabilities calculated.
`protected Logger`	`logger` Logger used for output.
`protected int[]`	`totalNGrams` Total ngram tag counts.
`protected int`	`totalWords` Total number of words.
`protected static int`	`TRIGRAM`
`protected Map3D<java.lang.String,java.lang.String,java.lang.String,java.lang.Integer>`	`trigramCountMap`
`protected Map3D<java.lang.String,java.lang.String,java.lang.String,java.lang.Double>`	`trigramProbMap`
`protected double[]`	`trigramWeights` Trigram weights from deleted interpolation.
`protected static int`	`UNIGRAM` Constants for clarification.
`protected java.util.Map<java.lang.String,java.lang.Integer>`	`unigramCountMap` HashMaps with part of speech tags as the keys and counts as the values.
`protected java.util.Map<java.lang.String,java.lang.Double>`	`unigramProbMap` HashMaps with part of speech tags as the keys and transition probability as the values.
`protected int[]`	`uniqueNGrams` Unique ngram tag counts.

Constructor Summary

Constructors
Constructor and Description

TransitionMatrix()

Constructors
Constructor and Description
`TransitionMatrix()`

Method Summary

Methods
Modifier and Type	Method and Description
`void`	`calculateProbabilities()` Calculate transition probabilities from counts.
`java.util.Set<java.lang.String>`	`columnKeySet()` Get column key set.
`protected void`	`computeBigramWeights()` Calculate bigram weights for contextual smoothing.
`protected void`	`computeTrigramWeights()` Calculate trigram weights for contextual smoothing.
`void`	`displayNGramCounts()` Display the ngram counts.
`double[]`	`getBigramWeights()` Return weights for bigrams using deleted interpolation.
`int`	`getCount(java.lang.String tag)` Look up unigram count.
`int`	`getCount(java.lang.String tag1, java.lang.String tag2)` Look up bigram count.
`int`	`getCount(java.lang.String tag1, java.lang.String tag2, java.lang.String tag3)` Look up trigram count.
`Logger`	`getLogger()` Get the logger.
`double`	`getProbability(java.lang.String tag)` Look up unigram probability.
`double`	`getProbability(java.lang.String tag1, java.lang.String tag2)` Look up bigram probability.
`double`	`getProbability(java.lang.String tag1, java.lang.String tag2, java.lang.String tag3)` Look up trigram probability.
`int`	`getTotalWordCount()` Get total number of words.
`double[]`	`getTrigramWeights()` Return weights for trigrams using deleted interpolation.
`void`	`incrementCount(java.lang.String tag, int increment)` Increment unigram tag count.
`void`	`incrementCount(java.lang.String tag1, java.lang.String tag2, int increment)` Increment bigram tag count.
`void`	`incrementCount(java.lang.String tag1, java.lang.String tag2, java.lang.String tag3, int increment)` Increment trigram tag count.
`void`	`loadTransitionMatrix(java.io.Reader reader, char delimChar)` Load transition matrix from a reader.
`void`	`loadTransitionMatrix(java.net.URL url, boolean compressed, java.lang.String encoding, char delimChar)` Load transition matrix from a URL.
`void`	`loadTransitionMatrix(java.net.URL url, java.lang.String encoding, char delimChar)` Load transition matrix from a URL.
`java.util.Set<java.lang.String>`	`rowKeySet()` Get row key set.
`double`	`safelyDivideCount(int numerator, int denominator)` Safely divide two counts.
`double`	`safelyDivideSmoothedCount(int numerator, int denominator)` Safely divide two counts.
`void`	`saveTransitionMatrix(java.lang.String transitionFileName, java.lang.String encoding, char delimChar)` Save transition matrix to a file.
`void`	`saveTransitionMatrix(java.io.Writer writer, char delimChar)` Save transition matrix to a writer.
`void`	`setLogger(Logger logger)` Set the logger.
`java.util.Set<java.lang.String>`	`sliceKeySet()` Get slice key set.

Methods inherited from class edu.northwestern.at.utils.IsCloseableObject
close

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - debug
```
protected static boolean debug
```
    True if debugging output enabled.
  - unigramCountMap
```
protected java.util.Map<java.lang.String,java.lang.Integer> unigramCountMap
```
    HashMaps with part of speech tags as the keys and counts as the values.
  - bigramCountMap
```
protected Map2D<java.lang.String,java.lang.String,java.lang.Integer> bigramCountMap
```
  - trigramCountMap
```
protected Map3D<java.lang.String,java.lang.String,java.lang.String,java.lang.Integer> trigramCountMap
```
  - unigramProbMap
```
protected java.util.Map<java.lang.String,java.lang.Double> unigramProbMap
```
    HashMaps with part of speech tags as the keys and transition probability as the values.
  - bigramProbMap
```
protected Map2D<java.lang.String,java.lang.String,java.lang.Double> bigramProbMap
```
  - trigramProbMap
```
protected Map3D<java.lang.String,java.lang.String,java.lang.String,java.lang.Double> trigramProbMap
```
  - totalNGrams
```
protected int[] totalNGrams
```
    Total ngram tag counts.
  - uniqueNGrams
```
protected int[] uniqueNGrams
```
    Unique ngram tag counts.
  - totalWords
```
protected int totalWords
```
    Total number of words.
  - haveProbabilities
```
protected boolean haveProbabilities
```
    True if probabilities calculated.
  - bigramWeights
```
protected double[] bigramWeights
```
    Bigram weights from deleted interpolation.
  - trigramWeights
```
protected double[] trigramWeights
```
    Trigram weights from deleted interpolation.
  - UNIGRAM
```
protected static final int UNIGRAM
```
    Constants for clarification.
    
    See Also:
    Constant Field Values
  - BIGRAM
```
protected static final int BIGRAM
```
    See Also:
    Constant Field Values
  - TRIGRAM
```
protected static final int TRIGRAM
```
    See Also:
    Constant Field Values
  - logger
```
protected Logger logger
```
    Logger used for output.
- Constructor Detail
  - TransitionMatrix
```
public TransitionMatrix()
```
- Method Detail
  - getLogger
```
public Logger getLogger()
```
    Get the logger.
    
    Specified by:
    
    getLogger in interface UsesLogger
    
    Returns:
    The logger.
  - setLogger
```
public void setLogger(Logger logger)
```
    Set the logger.
    
    Specified by:
    
    setLogger in interface UsesLogger
    
    Parameters:
    logger - The logger.
  - incrementCount
```
public void incrementCount(java.lang.String tag,
                  int increment)
```
    Increment unigram tag count.
    
    Parameters:
    tag - The part of speech tag.
    increment - The increment.
  - incrementCount
```
public void incrementCount(java.lang.String tag1,
                  java.lang.String tag2,
                  int increment)
```
    Increment bigram tag count.
    
    Parameters:
    tag1 - The first part of speech tag.
    tag2 - The second part of speech tag.
    increment - The increment.
  - incrementCount
```
public void incrementCount(java.lang.String tag1,
                  java.lang.String tag2,
                  java.lang.String tag3,
                  int increment)
```
    Increment trigram tag count.
    
    Parameters:
    tag1 - The first part of speech tag.
    tag2 - The second part of speech tag.
    tag3 - The third part of speech tag.
    increment - The increment.
  - safelyDivideCount
```
public double safelyDivideCount(int numerator,
                       int denominator)
```
    Safely divide two counts.
    
    Parameters:
    numerator - The undiscounted numerator value.
    denominator - The undiscounted denominator value.
    
    Returns:
    numerator / denominator , or 0 if denominator <= 0.
  - safelyDivideSmoothedCount
```
public double safelyDivideSmoothedCount(int numerator,
                               int denominator)
```
    Safely divide two counts.
    
    Parameters:
    numerator - The undiscounted numerator value.
    denominator - The undiscounted denominator value.
    
    Returns:
    (numerator - 1 ) / ( denominator - 1 ), or 0 if ( denominator - 1 ) <= 0.
  - calculateProbabilities
```
public void calculateProbabilities()
```
    Calculate transition probabilities from counts.
  - computeTrigramWeights
```
protected void computeTrigramWeights()
```
    Calculate trigram weights for contextual smoothing.
    The trigram weights are computed using deleted interpolation.
  - computeBigramWeights
```
protected void computeBigramWeights()
```
    Calculate bigram weights for contextual smoothing.
    The bigram weights are computed using deleted interpolation.
  - getCount
```
public int getCount(java.lang.String tag)
```
    Look up unigram count.
    
    Parameters:
    tag - The part of speech tag.
    
    Returns:
    Count of tag.
  - getCount
```
public int getCount(java.lang.String tag1,
           java.lang.String tag2)
```
    Look up bigram count.
    
    Parameters:
    tag1 - The first part of speech tag.
    tag2 - The second part of speech tag.
    
    Returns:
    Count of tag1 followed by tag2.
  - getCount
```
public int getCount(java.lang.String tag1,
           java.lang.String tag2,
           java.lang.String tag3)
```
    Look up trigram count.
    
    Parameters:
    tag1 - The first part of speech tag.
    tag2 - The second part of speech tag.
    tag3 - The third part of speech tag.
    
    Returns:
    Count of tag1 followed by tag2 folowed by tag3.
  - getProbability
```
public double getProbability(java.lang.String tag)
```
    Look up unigram probability.
    
    Parameters:
    tag - The part of speech tag.
    
    Returns:
    Probability of tag.
  - getProbability
```
public double getProbability(java.lang.String tag1,
                    java.lang.String tag2)
```
    Look up bigram probability.
    
    Parameters:
    tag1 - The first part of speech tag.
    tag2 - The second part of speech tag.
    
    Returns:
    Transition probability of tag1 followed by tag2.
  - getProbability
```
public double getProbability(java.lang.String tag1,
                    java.lang.String tag2,
                    java.lang.String tag3)
```
    Look up trigram probability.
    
    Parameters:
    tag1 - The first part of speech tag.
    tag2 - The second part of speech tag.
    tag3 - The third part of speech tag.
    
    Returns:
    Transition probability of tag1 followed by tag2 followed by tag3.
  - rowKeySet
```
public java.util.Set<java.lang.String> rowKeySet()
```
    Get row key set.
    
    Returns:
    row key set.
  - columnKeySet
```
public java.util.Set<java.lang.String> columnKeySet()
```
    Get column key set.
    
    Returns:
    column key set.
  - sliceKeySet
```
public java.util.Set<java.lang.String> sliceKeySet()
```
    Get slice key set.
    
    Returns:
    slice key set.
  - getTotalWordCount
```
public int getTotalWordCount()
```
    Get total number of words.
    
    Returns:
    Total number of words.
  - loadTransitionMatrix
```
public void loadTransitionMatrix(java.net.URL url,
                        boolean compressed,
                        java.lang.String encoding,
                        char delimChar)
                          throws java.io.IOException
```
    Load transition matrix from a URL.
    
    Parameters:
    url - URL from which to load transition matrix.
    compressed - true if gzip compressed.
    encoding - Character encoding for file text.
    delimChar - Column separator character. Usually a tab (\t).
    
    Throws:
    
    java.io.IOException - when an I/O error occurs.
  - loadTransitionMatrix
```
public void loadTransitionMatrix(java.net.URL url,
                        java.lang.String encoding,
                        char delimChar)
                          throws java.io.IOException
```
    Load transition matrix from a URL.
    
    Parameters:
    url - URL from which to load transition matrix.
    encoding - Character encoding for file text.
    delimChar - Column separator character. Usually a tab (\t).
    
    Throws:
    
    java.io.IOException - when an I/O error occurs.
  - loadTransitionMatrix
```
public void loadTransitionMatrix(java.io.Reader reader,
                        char delimChar)
                          throws java.io.IOException
```
    Load transition matrix from a reader.
    
    Parameters:
    reader - Reader from which to read transition matrix.
    delimChar - Column separator character. Usually a tab (\t).
    
    Throws:
    
    java.io.IOException - when an I/O error occurs.
  - displayNGramCounts
```
public void displayNGramCounts()
```
    Display the ngram counts.
  - saveTransitionMatrix
```
public void saveTransitionMatrix(java.lang.String transitionFileName,
                        java.lang.String encoding,
                        char delimChar)
                          throws java.io.IOException
```
    Save transition matrix to a file.
    
    Parameters:
    transitionFileName - File to receive the transition matrix.
    encoding - Character encoding for file text.
    delimChar - Column separator character. Usually a tab (\t).
    
    Throws:
    
    java.io.IOException - when an I/O error occurs.
    Each unigram, bigram, and trigram entry in the transition matrix is saved in a columnar format with the specified delimiter character acting as the column separator. The counts are saved, not the probabilities, so that different smoothing methods can be applied without requiring the training date be recreated.
    
    tag count tag1 tag2 count tag1 tag2 tag3 count
  - saveTransitionMatrix
```
public void saveTransitionMatrix(java.io.Writer writer,
                        char delimChar)
                          throws java.io.IOException
```
    Save transition matrix to a writer.
    
    Parameters:
    writer - Writer to use to save transition matrix.
    delimChar - Column separator character. Usually a tab (\t).
    
    Throws:
    
    java.io.IOException - when an I/O error occurs.
  - getBigramWeights
```
public double[] getBigramWeights()
```
    Return weights for bigrams using deleted interpolation.
    
    Returns:
    Two element double array "lambda" of ngram weights. lambda[0] = bigram weight lambda[1] = unigram weight
    The sum of the lambda values is 1.0 . The adjusted probability for a bigram is computed from the maximum likelihood probabilities (i.e., undiscounted) as follows.
    
    p*( tag2 | tag1 ) =< br /> lambda[0] * p( tag2 | tag1 ) + lambda[1] * p( tag2 )
  getTrigramWeights public double[] getTrigramWeights() Return weights for trigrams using deleted interpolation. Returns: Three element double array "lambda" of ngram weights. lambda[0] = trigram weight lambda[1] = bigram weight lambda[2] = unigram weight The sum of the lambda values is 1.0 . The adjusted probability for a trigram is computed from the maximum likelihood probabilities (i.e., undiscounted) as follows. p*( tag3 | tag1 , tag2 ) =< br /> lambda[0] * p( tag3 | tag1 , tag2 ) + lambda[1] * p( tag3 | tag2 ) + lambda[2] * p( tag3 )

Class TransitionMatrix

Field Summary

Constructor Summary

Method Summary

Methods inherited from class edu.northwestern.at.utils.IsCloseableObject

Methods inherited from class java.lang.Object

Field Detail

debug

unigramCountMap

bigramCountMap

trigramCountMap

unigramProbMap

bigramProbMap

trigramProbMap

totalNGrams

uniqueNGrams

totalWords

haveProbabilities

bigramWeights

trigramWeights

UNIGRAM

BIGRAM

TRIGRAM

logger

Constructor Detail

TransitionMatrix

Method Detail

getLogger

setLogger

incrementCount

incrementCount

incrementCount

safelyDivideCount

safelyDivideSmoothedCount

calculateProbabilities

computeTrigramWeights

computeBigramWeights

getCount

getCount

getCount

getProbability

getProbability

getProbability

rowKeySet

columnKeySet

sliceKeySet

getTotalWordCount

loadTransitionMatrix

loadTransitionMatrix

loadTransitionMatrix

displayNGramCounts

saveTransitionMatrix

saveTransitionMatrix

getBigramWeights

getTrigramWeights