public class C99
extends java.lang.Object
Use of this code is free for academic, education, research and other non-profit making uses only.
Modifier and Type | Class and Description |
---|---|
protected static class |
C99.Region
Text segment region.
|
Constructor and Description |
---|
C99() |
Modifier and Type | Method and Description |
---|---|
protected static int[] |
boundaries(double[][] m,
int n)
Find density maximizing boundaries for regions in a similarity matrix.
|
protected static ContextVector[] |
normalize(java.lang.String[][] document,
ContextVector tf,
StopWords stopWords,
Stemmer stemmer)
Produce stem frequency tables for a tokenized document.
|
protected static ContextVector[] |
normalize(java.lang.String[][] document,
StopWords stopWords,
Stemmer stemmer)
Produce stem frequency tables for a tokenized document.
|
protected static double[][] |
rank(double[][] f,
int maskSize)
Apply hard ranking to matrix using a mask.
|
static java.lang.String[][][] |
segment(java.lang.String[][] document,
int n,
int s,
StopWords stopWords,
Stemmer stemmer)
Segment document into coherent topic segments.
|
static java.lang.String[][][] |
segmentW(java.lang.String[][] document,
int n,
int s,
StopWords stopWords,
Stemmer stemmer)
Segment document into coherent topic segments.
|
protected static double[][] |
similarity(ContextVector[] v)
Given context vectors, compute the similarity matrix.
|
protected static double[][] |
similarity(ContextVector[] v,
EntropyVector entropy)
Given context vectors, compute the similarity matrix.
|
protected static java.lang.String[][][] |
split(java.lang.String[][] text,
int[] boundaries)
Split text into segment blocks given topic boundaries.
|
protected static double[][] |
sum(double[][] rankMatrix)
Compute sum of rank matrix.
|
protected static int[] boundaries(double[][] m, int n)
m
- Similarity matrix.n
- Number of regions to find.
If n = 1, the algorithm will determine the number of
regions.protected static ContextVector[] normalize(java.lang.String[][] document, StopWords stopWords, Stemmer stemmer)
document
- Tokenized document.stopWords
- Stop words.stemmer
- Stemmer.protected static ContextVector[] normalize(java.lang.String[][] document, ContextVector tf, StopWords stopWords, Stemmer stemmer)
document
- Tokenized document.tf
- Term frequencies in document.stopWords
- Stop words.stemmer
- Stemmer.protected static double[][] rank(double[][] f, int maskSize)
f
- Matrix to which to apply hard ranking.maskSize
- Mask size.
Hard ranking replaces a pixel value with the proportion of neighboring values it exceeds, using a maskSize x maskSize size mask.
public static java.lang.String[][][] segment(java.lang.String[][] document, int n, int s, StopWords stopWords, Stemmer stemmer)
document
- Document text as list of elementary
text blocks.n
- Number of topic segments desired.
Set n = -1 to have algorithm select
number of topic segments by monitoring
the rate of increase in segment density.s
- Size of ranking mask.
Must be odd number >= 3.stopWords
- Stop words.stemmer
- Stemmer.public static java.lang.String[][][] segmentW(java.lang.String[][] document, int n, int s, StopWords stopWords, Stemmer stemmer)
document
- Document text as list of elementary
text blocks.n
- Number of topic segments desired.
Set n = -1 to have algorithm select
number of topic segments by monitoring
the rate of increase in segment density.s
- Size of ranking mask.
Must be odd number >= 3.stopWords
- Stop words.stemmer
- Stemmer.protected static double[][] similarity(ContextVector[] v)
v
- context vectors.protected static double[][] similarity(ContextVector[] v, EntropyVector entropy)
v
- context vectors.entropy
- entropy vector.protected static java.lang.String[][][] split(java.lang.String[][] text, int[] boundaries)
text
- Source text.boundaries
- Boundaries.protected static double[][] sum(double[][] rankMatrix)
rankMatrix
- Rank matrix.