public class Abbreviations
extends java.lang.Object
Holds a list of common abbreviations along with information about whether each abbreviation can normally end a sentence or not. Also provides patterns and methods for determining if a string is a possible abbreviation.
Modifier and Type | Field and Description |
---|---|
protected static java.util.regex.Matcher |
abbreviationMatcher |
protected static java.util.regex.Pattern |
abbreviationPattern |
protected UTF8Properties |
abbreviations |
static java.lang.String |
defaultAbbreviationPattern |
protected static java.lang.String |
defaultAbbreviationsFileName |
protected static java.util.regex.Matcher |
initialMatcher |
protected static java.util.regex.Pattern |
initialPattern |
protected static java.util.regex.Matcher |
possessiveInitialMatcher |
protected static java.util.regex.Pattern |
possessiveInitialPattern |
Constructor and Description |
---|
Abbreviations()
Create abbreviation detector.
|
Abbreviations(java.lang.String langCode)
Create abbreviation detector for specified ISO language code.
|
Modifier and Type | Method and Description |
---|---|
static java.lang.String |
createAbbreviationsPattern(UTF8Properties abbreviations)
Create abbreviations pattern from list of abbreviations.
|
UTF8Properties |
getAbbreviations()
Return current abbreviations.
|
int |
getAbbreviationsCount()
Get count of known abbreviations.
|
boolean |
isAbbreviation(java.lang.String str)
Checks if string is a probable abbreviation.
|
boolean |
isEOSAbbreviation(java.lang.String str)
Checks if string is an abbreviation on which a sentence can end.
|
static boolean |
isInitial(java.lang.String str)
Checks if string is an initial.
|
boolean |
isKnownAbbreviation(java.lang.String str)
Checks if string is a known abbreviation.
|
static boolean |
isPossessiveInitial(java.lang.String str)
Checks if string is a possible possessive initial.
|
boolean |
loadAbbreviations(java.lang.String abbreviationsURL)
Load abbreviations list from a properties file.
|
static UTF8Properties |
loadAbbreviationsFromResource(java.lang.String langCode)
Load abbreviations list from resource properties file.
|
public static java.lang.String defaultAbbreviationPattern
protected static java.util.regex.Pattern abbreviationPattern
protected static java.util.regex.Matcher abbreviationMatcher
protected static java.util.regex.Pattern initialPattern
protected static java.util.regex.Matcher initialMatcher
protected static java.util.regex.Pattern possessiveInitialPattern
protected static java.util.regex.Matcher possessiveInitialMatcher
protected UTF8Properties abbreviations
protected static final java.lang.String defaultAbbreviationsFileName
public Abbreviations()
public Abbreviations(java.lang.String langCode)
langCode
- ISO language code.public static UTF8Properties loadAbbreviationsFromResource(java.lang.String langCode)
Each line in the UTF8 abbreviations property file takes the form:
abbrev.=n
where a value of 1 for n indicates the abbreviation can normally end a sentence and a value of 0 for n indicates the abbreviation normally cannot end a sentence.
if there is not a resource file for the given language code, the abbreviations list will be empty.
public boolean loadAbbreviations(java.lang.String abbreviationsURL)
abbreviationsURL
- Abbreviations URL.Each line in the UTF8 abbreviations property file takes the form:
abbrev.=n
where a value of 1 for n indicates the abbreviation can normally end a sentence and a value of 0 for n indicates the abbreviation normally cannot end a sentence.
public boolean isKnownAbbreviation(java.lang.String str)
str
- The string to check.public boolean isAbbreviation(java.lang.String str)
str
- The string to check.A string is declared to be a probable abbreviation if if appears in the abbreviation list or matches the abbreviation pattern.
public boolean isEOSAbbreviation(java.lang.String str)
str
- The string to check.A string is declared to be a probable sentence-ending abbreviation if it appears in the abbreviation list and it has a sentence-ending value of 1.
public static boolean isInitial(java.lang.String str)
str
- The string to check.A string is an initial when it takes the form "L." where L is a capital letter.
public static boolean isPossessiveInitial(java.lang.String str)
str
- The string to check.A string is an possible possessive initial when it takes the form "L.'s" where L is a capital letter.
public int getAbbreviationsCount()
public UTF8Properties getAbbreviations()
public static java.lang.String createAbbreviationsPattern(UTF8Properties abbreviations)
abbreviations
- Abbreviations for which to create pattern.