C99 (MorphAdorner)

java.lang.Object
- edu.northwestern.at.morphadorner.corpuslinguistics.textsegmenter.c99.C99

```
public class C99
extends java.lang.Object
```
Choi's C99 algorithm for linear text segmentation

Author:

Freddy Choi, Philip R. Burns. Modified for integration in MorphAdorner.
Use of this code is free for academic, education, research and other non-profit making uses only.

Nested Class Summary

Nested Classes
Modifier and Type Class and Description

protected static class C99.Region
Text segment region.

Nested Classes
Modifier and Type	Class and Description
`protected static class`	`C99.Region` Text segment region.

Constructor Summary

Constructors
Constructor and Description

C99()

Constructors
Constructor and Description
`C99()`

Method Summary

Methods
Modifier and Type	Method and Description
`protected static int[]`	`boundaries(double[][] m, int n)` Find density maximizing boundaries for regions in a similarity matrix.
`protected static ContextVector[]`	`normalize(java.lang.String[][] document, ContextVector tf, StopWords stopWords, Stemmer stemmer)` Produce stem frequency tables for a tokenized document.
`protected static ContextVector[]`	`normalize(java.lang.String[][] document, StopWords stopWords, Stemmer stemmer)` Produce stem frequency tables for a tokenized document.
`protected static double[][]`	`rank(double[][] f, int maskSize)` Apply hard ranking to matrix using a mask.
`static java.lang.String[][][]`	`segment(java.lang.String[][] document, int n, int s, StopWords stopWords, Stemmer stemmer)` Segment document into coherent topic segments.
`static java.lang.String[][][]`	`segmentW(java.lang.String[][] document, int n, int s, StopWords stopWords, Stemmer stemmer)` Segment document into coherent topic segments.
`protected static double[][]`	`similarity(ContextVector[] v)` Given context vectors, compute the similarity matrix.
`protected static double[][]`	`similarity(ContextVector[] v, EntropyVector entropy)` Given context vectors, compute the similarity matrix.
`protected static java.lang.String[][][]`	`split(java.lang.String[][] text, int[] boundaries)` Split text into segment blocks given topic boundaries.
`protected static double[][]`	`sum(double[][] rankMatrix)` Compute sum of rank matrix.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - C99
```
public C99()
```
- Method Detail
  - boundaries
```
protected static int[] boundaries(double[][] m,
               int n)
```
    Find density maximizing boundaries for regions in a similarity matrix.
    
    Parameters:
    m - Similarity matrix.
    n - Number of regions to find. If n = 1, the algorithm will determine the number of regions.
    
    Returns:
    Boundaries of regions in selection order.
  - normalize
```
protected static ContextVector[] normalize(java.lang.String[][] document,
                        StopWords stopWords,
                        Stemmer stemmer)
```
    Produce stem frequency tables for a tokenized document.
    
    Parameters:
    document - Tokenized document.
    stopWords - Stop words.
    stemmer - Stemmer.
    
    Returns:
    Context vector of stem frequencies.
  - normalize
```
protected static ContextVector[] normalize(java.lang.String[][] document,
                        ContextVector tf,
                        StopWords stopWords,
                        Stemmer stemmer)
```
    Produce stem frequency tables for a tokenized document.
    
    Parameters:
    document - Tokenized document.
    tf - Term frequencies in document.
    stopWords - Stop words.
    stemmer - Stemmer.
    
    Returns:
    Context vector of stem frequencies.
  - rank
```
protected static double[][] rank(double[][] f,
              int maskSize)
```
    Apply hard ranking to matrix using a mask.
    
    Parameters:
    f - Matrix to which to apply hard ranking.
    maskSize - Mask size.
    Hard ranking replaces a pixel value with the proportion of neighboring values it exceeds, using a maskSize x maskSize size mask.
  - segment
```
public static java.lang.String[][][] segment(java.lang.String[][] document,
                             int n,
                             int s,
                             StopWords stopWords,
                             Stemmer stemmer)
```
    Segment document into coherent topic segments.
    
    Parameters:
    document - Document text as list of elementary text blocks.
    n - Number of topic segments desired. Set n = -1 to have algorithm select number of topic segments by monitoring the rate of increase in segment density.
    s - Size of ranking mask. Must be odd number >= 3.
    stopWords - Stop words.
    stemmer - Stemmer.
    
    Returns:
    Coherent topic segments.
  - segmentW
```
public static java.lang.String[][][] segmentW(java.lang.String[][] document,
                              int n,
                              int s,
                              StopWords stopWords,
                              Stemmer stemmer)
```
    Segment document into coherent topic segments.
    
    Parameters:
    document - Document text as list of elementary text blocks.
    n - Number of topic segments desired. Set n = -1 to have algorithm select number of topic segments by monitoring the rate of increase in segment density.
    s - Size of ranking mask. Must be odd number >= 3.
    stopWords - Stop words.
    stemmer - Stemmer.
    
    Returns:
    Coherent topic segments.
  - similarity
```
protected static double[][] similarity(ContextVector[] v)
```
    Given context vectors, compute the similarity matrix.
    
    Parameters:
    v - context vectors.
    
    Returns:
    similarity matrix.
  - similarity
```
protected static double[][] similarity(ContextVector[] v,
                    EntropyVector entropy)
```
    Given context vectors, compute the similarity matrix.
    
    Parameters:
    v - context vectors.
    entropy - entropy vector.
    
    Returns:
    similarity matrix.
  - split
```
protected static java.lang.String[][][] split(java.lang.String[][] text,
                           int[] boundaries)
```
    Split text into segment blocks given topic boundaries.
    
    Parameters:
    text - Source text.
    boundaries - Boundaries.
    
    Returns:
    Topic segments.
  - sum
```
protected static double[][] sum(double[][] rankMatrix)
```
    Compute sum of rank matrix.
    
    Parameters:
    rankMatrix - Rank matrix.
    
    Returns:
    Sum of rank matrix.

Class C99

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

C99

Method Detail

boundaries

normalize

normalize

rank

segment

segmentW

similarity

similarity

split

sum