|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectedu.northwestern.at.utils.corpuslinguistics.textsegmenter.c99.C99
public class C99
Choi's C99 algorithm for linear text segmentation
Use of this code is free for academic, education, research and other non-profit making uses only.
| Nested Class Summary | |
|---|---|
protected static class |
C99.Region
Text segment region. |
| Constructor Summary | |
|---|---|
C99()
|
|
| Method Summary | |
|---|---|
protected static int[] |
boundaries(double[][] m,
int n)
Find density maximizing boundaries for regions in a similarity matrix. |
protected static ContextVector[] |
normalize(java.lang.String[][] document,
ContextVector tf,
StopWords stopWords,
Stemmer stemmer)
Produce stem frequency tables for a tokenized document. |
protected static ContextVector[] |
normalize(java.lang.String[][] document,
StopWords stopWords,
Stemmer stemmer)
Produce stem frequency tables for a tokenized document. |
protected static double[][] |
rank(double[][] f,
int maskSize)
Apply hard ranking to matrix using a mask. |
static java.lang.String[][][] |
segment(java.lang.String[][] document,
int n,
int s,
StopWords stopWords,
Stemmer stemmer)
Segment document into coherent topic segments. |
static java.lang.String[][][] |
segmentW(java.lang.String[][] document,
int n,
int s,
StopWords stopWords,
Stemmer stemmer)
Segment document into coherent topic segments. |
protected static double[][] |
similarity(ContextVector[] v)
Given context vectors, compute the similarity matrix. |
protected static double[][] |
similarity(ContextVector[] v,
EntropyVector entropy)
Given context vectors, compute the similarity matrix. |
protected static java.lang.String[][][] |
split(java.lang.String[][] text,
int[] boundaries)
Split text into segment blocks given topic boundaries. |
protected static double[][] |
sum(double[][] rankMatrix)
Compute sum of rank matrix. |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Constructor Detail |
|---|
public C99()
| Method Detail |
|---|
protected static int[] boundaries(double[][] m,
int n)
m - Similarity matrix.n - Number of regions to find.
If n = 1, the algorithm will determine the number of
regions.
protected static ContextVector[] normalize(java.lang.String[][] document,
StopWords stopWords,
Stemmer stemmer)
document - Tokenized document.stopWords - Stop words.stemmer - Stemmer.
protected static ContextVector[] normalize(java.lang.String[][] document,
ContextVector tf,
StopWords stopWords,
Stemmer stemmer)
document - Tokenized document.tf - Term frequencies in document.stopWords - Stop words.stemmer - Stemmer.
protected static double[][] rank(double[][] f,
int maskSize)
f - Matrix to which to apply hard ranking.maskSize - Mask size.
Hard ranking replaces a pixel value with the proportion of neighboring values it exceeds, using a maskSize x maskSize size mask.
public static java.lang.String[][][] segment(java.lang.String[][] document,
int n,
int s,
StopWords stopWords,
Stemmer stemmer)
document - Document text as list of elementary
text blocks.n - Number of topic segments desired.
Set n = -1 to have algorithm select
number of topic segments by monitoring
the rate of increase in segment density.s - Size of ranking mask.
Must be odd number >= 3.stopWords - Stop words.stemmer - Stemmer.
public static java.lang.String[][][] segmentW(java.lang.String[][] document,
int n,
int s,
StopWords stopWords,
Stemmer stemmer)
document - Document text as list of elementary
text blocks.n - Number of topic segments desired.
Set n = -1 to have algorithm select
number of topic segments by monitoring
the rate of increase in segment density.s - Size of ranking mask.
Must be odd number >= 3.stopWords - Stop words.stemmer - Stemmer.
protected static double[][] similarity(ContextVector[] v)
v - context vectors.
protected static double[][] similarity(ContextVector[] v,
EntropyVector entropy)
v - context vectors.entropy - entropy vector.
protected static java.lang.String[][][] split(java.lang.String[][] text,
int[] boundaries)
text - Source text.boundaries - Boundaries.
protected static double[][] sum(double[][] rankMatrix)
rankMatrix - Rank matrix.
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||