public class CountMapUtils
extends java.lang.Object
Count maps have a key representing items to count and java.lang.Number values as the counts. In many cases, the Number values are of type java.lang.Integer, but this is not necessarily the case. For example, a count map may have strings as keys and java.lang.Double scaled word frequencies as values.
Modifier | Constructor and Description |
---|---|
protected |
CountMapUtils()
Don't allow instantiation but do allow overrides.
|
Modifier and Type | Method and Description |
---|---|
static <K> void |
addCountMap(java.util.Map<K,java.lang.Number> destinationMap,
java.util.Map<K,java.lang.Number> sourceMap)
Add words/counts from one map to another.
|
static <K> java.util.Map<K,java.lang.Number> |
booleanizeCountMap(java.util.Map<K,? extends java.lang.Number> map)
Convert map values to integer 1 or 0.
|
static <V extends java.lang.Number> |
convertKeysToStrings(java.util.Map<? extends java.lang.Object,V> countMap)
Convert keys in count map to plain strings.
|
static <K> int |
getCountOfWordsInCommon(java.util.Map<K,? extends java.lang.Number> countMap1,
java.util.Map<K,? extends java.lang.Number> countMap2)
Get count of words which two count maps share.
|
static double[] |
getSummaryCountsFromCountMap(java.util.Map<? extends java.lang.Object,? extends java.lang.Number> map)
Get summary counts from a count map.
|
static double |
getSumOfCrossProducts(java.util.Map<? extends java.lang.Object,? extends java.lang.Number> countMap1,
java.util.Map<? extends java.lang.Object,? extends java.lang.Number> countMap2)
Get sum of cross products for counts in two maps.
|
static int |
getTotalWordCount(java.util.Map<? extends java.lang.Object,? extends java.lang.Number> map)
Get total count of words in map.
|
static <K> int |
getWordCount(java.util.Map<K,? extends java.lang.Number> countMap,
K word)
Get count for a specific word form from a count map.
|
static <K> java.util.Set<K> |
getWordsFromMap(java.util.Map<K,?> map)
Get words from a map.
|
static <K> java.util.List |
getWordsInCommon(java.util.Map<K,? extends java.lang.Number> countMap1,
java.util.Map<K,? extends java.lang.Number> countMap2)
Get list of words which two count maps share.
|
static <K> void |
incrementCountMap(java.util.Map<K,java.lang.Number> destinationMap,
java.util.Map<K,java.lang.Number> sourceMap)
Increment words/counts in one map from another.
|
static java.util.Map<java.lang.String,java.lang.Number> |
loadCountMapFromFile(java.io.File file)
Load strings and counts into count map from a file.
|
static java.util.Map<java.lang.String,java.lang.Number> |
loadCountMapFromFile(java.io.File file,
java.lang.String encoding)
Load strings and counts into count map from a file.
|
static java.util.Map<java.lang.String,java.lang.Number> |
loadCountMapFromReader(java.io.Reader reader)
Load strings and counts into count map from a reader.
|
static <K> java.util.Map<K,java.lang.Number> |
scaleCountMap(java.util.Map<K,? extends java.lang.Number> map,
double scaleFactor)
Scale count entries in count map.
|
static <K,V extends java.lang.Number> |
semiDeepClone(java.util.Map<K,V> countMap)
Get semi-deep clone of a count map.
|
static java.lang.String[] |
splitKeyedCountString(java.lang.String s)
Split string at tab character.
|
static <K> void |
subtractCountMap(java.util.Map<K,java.lang.Number> destinationMap,
java.util.Map<K,java.lang.Number> sourceMap)
Subtract words/counts in one map from another.
|
static <K> void |
updateWordCountMap(K word,
int count,
java.util.Map<K,java.lang.Number> countMap)
Updates counts for a word in a map.
|
protected CountMapUtils()
public static double[] getSummaryCountsFromCountMap(java.util.Map<? extends java.lang.Object,? extends java.lang.Number> map)
map
- The map with string keys and Number counts
as values.public static int getTotalWordCount(java.util.Map<? extends java.lang.Object,? extends java.lang.Number> map)
map
- The map with string keys and Number counts
as values.public static double getSumOfCrossProducts(java.util.Map<? extends java.lang.Object,? extends java.lang.Number> countMap1, java.util.Map<? extends java.lang.Object,? extends java.lang.Number> countMap2)
countMap1
- First count map.countMap2
- Second count map.public static <K> java.util.Map<K,java.lang.Number> booleanizeCountMap(java.util.Map<K,? extends java.lang.Number> map)
map
- Count map to booleanize.Non-zero counts are converted to integer 1, 0 counts are converted to integer 0.
public static <K> java.util.Map<K,java.lang.Number> scaleCountMap(java.util.Map<K,? extends java.lang.Number> map, double scaleFactor)
map
- The count map.scaleFactor
- The double value by which to
multiply each count value in the count map.public static <K> java.util.Set<K> getWordsFromMap(java.util.Map<K,?> map)
map
- The map with arbitrary objects as keys and
Number counts as values.public static java.lang.String[] splitKeyedCountString(java.lang.String s)
s
- The string to split into a key and a count.public static <K> void addCountMap(java.util.Map<K,java.lang.Number> destinationMap, java.util.Map<K,java.lang.Number> sourceMap)
destinationMap
- Destination map.sourceMap
- Source map.
On output, the destination map is updated with words and counts from the source map. The key type for both input maps must be the same for this to make sense.
public static <K> void incrementCountMap(java.util.Map<K,java.lang.Number> destinationMap, java.util.Map<K,java.lang.Number> sourceMap)
destinationMap
- Destination map.sourceMap
- Source map.
The key types for the two input maps must be the same.
On output, the destination map counts are incremented by one for each word appearing in the source map. If a source word does not already appear in the destination, it is added with a count of one.
public static <K> void subtractCountMap(java.util.Map<K,java.lang.Number> destinationMap, java.util.Map<K,java.lang.Number> sourceMap)
destinationMap
- Destination map.sourceMap
- Source map.
The key types for the two input maps must be the same.
On output, the destination map counts are updated by removing the counts for matching words from the source map. If the count goes to zero for any word in the destination, that word is removed from from the destination map.
public static <K> java.util.List getWordsInCommon(java.util.Map<K,? extends java.lang.Number> countMap1, java.util.Map<K,? extends java.lang.Number> countMap2)
countMap1
- First count map.countMap2
- Second count map.public static <K> int getCountOfWordsInCommon(java.util.Map<K,? extends java.lang.Number> countMap1, java.util.Map<K,? extends java.lang.Number> countMap2)
countMap1
- First count map.countMap2
- Second count map.public static <K> int getWordCount(java.util.Map<K,? extends java.lang.Number> countMap, K word)
countMap
- The word count map.word
- The word text.public static <K> void updateWordCountMap(K word, int count, java.util.Map<K,java.lang.Number> countMap)
word
- The word.count
- The word count.countMap
- The word count map.public static <V extends java.lang.Number> java.util.Map<java.lang.String,java.lang.Number> convertKeysToStrings(java.util.Map<? extends java.lang.Object,V> countMap)
countMap
- The count map whose keys should be
converted to strings.Each key element's toString() method is called to convert the key object to a plain text string. Key elements without a toString() method will not be added to the result map. The object values (counts) are left untouched. Note that no key should be null. Null keys will be ignored.
public static java.util.Map<java.lang.String,java.lang.Number> loadCountMapFromReader(java.io.Reader reader) throws java.io.IOException, InvalidDataException
reader
- The reader.java.io.IOException
- when an I/O error occurs while
reading the input.InvalidDataException
- when an input line is
badly structured (too many or too few tokens)
or the count token cannot be converted to
an integer.
Each line of the input file has one string, followed by an Ascii tab character, followed by an integer count.
public static java.util.Map<java.lang.String,java.lang.Number> loadCountMapFromFile(java.io.File file, java.lang.String encoding) throws java.io.IOException, InvalidDataException
file
- The file.encoding
- Character encoding for file.java.io.IOException
- when an I/O error occurs while
reading the input.InvalidDataException
- when an input line is
badly structured (too many or too few tokens)
or the count token cannot be converted to
an integer.
Each line of the input file has one string, followed by an Ascii tab character, followed by a count. The counts may be integers or floating point values.
public static java.util.Map<java.lang.String,java.lang.Number> loadCountMapFromFile(java.io.File file) throws java.io.IOException, InvalidDataException
file
- The file.java.io.IOException
- when an I/O error occurs while
reading the input.InvalidDataException
- when an input line is
badly structured (too many or too few tokens)
or the count token cannot be converted to
an integer.
Each line of the input file has one string, followed by an Ascii tab character, followed by a count. The counts may be integers or floating point values. The input file is assumed to be in utf-8 format.
public static <K,V extends java.lang.Number> java.util.Map<java.lang.String,java.lang.Number> semiDeepClone(java.util.Map<K,V> countMap)
countMap
- The count map to clone.A semi-deep clone creates a new count map by duplicating each count value and creating new string keys using "toString()" on the original map's key values. For a count map with string keys, this produces a deep clone. For a count map with non-string keys, this produces what might be called a semi-deep clone, since only the string values of the original keys are duplicated.