public class MorphAdornerSettings
extends java.lang.Object
implements java.io.Serializable
This class holds the static values of global settings for MorphAdorner.
The MorphAdorner resource bundle contains definitions for all the string constants used in MorphAdorner. Once the resource strings have been initialized in the initializeSettings method you can access the strings by calling getString:
String s = MorphAdornerSettings.getString( "mystring" , "my string" );
Modifier and Type | Class and Description |
---|---|
static class |
MorphAdornerSettings.XMLIDType
XML ID types.
|
Modifier and Type | Field and Description |
---|---|
java.lang.String |
abbreviationsMainTextURL
Abbreviations URL for main text.
|
java.lang.String |
abbreviationsSideTextURL
Abbreviations URL for side text.
|
java.lang.String |
abbreviationsURL
Abbreviations URL.
|
boolean |
adornExistingXMLFiles
Adorn XML files with existing adorned version in output directory.
|
boolean |
allowLowerCaseProperNouns
Allow proper nouns to be lower case when part of speech tagging.
|
java.net.URL[] |
alternateSpellingsByWordClassURLs |
protected java.lang.String[] |
alternateSpellingsByWordClassURLStrings
Alternate spellings by word class URL.
|
java.net.URL[] |
alternateSpellingsURLs |
protected java.lang.String[] |
alternateSpellingsURLStrings
Alternate spellings URLs.
|
boolean |
closeSentenceAtEndOfHardTag
Close sentence at end of hard tag.
|
boolean |
closeSentenceAtEndOfJumpTag
Close sentence at end of jump tag.
|
java.net.URL |
contextRulesURL |
protected java.lang.String |
contextRulesURLString
Context rules URL.
|
boolean |
debug
True if debugging enabled.
|
protected java.lang.String |
defaultPropertiesURLString
MorphAdorner properties file URL.
|
java.lang.String |
disallowWordElementsIn
XML tags from which to remove word tags.
|
java.lang.String[] |
fileNames
File names to process.
|
boolean |
fixGapTags
Fix
|
boolean |
fixOrigTags
Fix
|
boolean |
fixSplitWords
Fix selected split words in input text.
|
java.util.List<PatternReplacer> |
fixSplitWordsPatternReplacers
Fix split words pattern replacers.
|
boolean |
ignoreLexiconEntriesForLemmatization
Ignore lemma in lexicon when lemmatizing.
|
protected boolean |
initialized
True if MorphAdorner already initialized.
|
java.net.URL |
lexicalRulesURL |
protected java.lang.String |
lexicalRulesURLString
Lexical rules URL.
|
MorphAdornerLogger |
morphAdornerLogger
Logger.
|
java.lang.String |
outputDirectoryName
Output directory name.
|
boolean |
outputEOSFlag
Output end of sentence flag.
|
java.lang.String |
outputEOSFlagAttribute
Output end of sentence flag attribute.
|
boolean |
outputKWIC
Output KWIC index.
|
int |
outputKWICWidth
Number of characters in KWIC index entry.
|
java.lang.String |
outputLeftKWICAttribute
Output left KWIC index attribute.
|
boolean |
outputLemma
Output lemma.
|
java.lang.String |
outputLemmaAttribute
Output lemma attribute.
|
boolean |
outputNonredundantAttributesOnly
Output only non-redundant word level attributes.
|
boolean |
outputNonredundantEosAttribute
Output only non-redundant eos attribute.
|
boolean |
outputNonredundantPartAttribute
Output only non-redundant part attribute.
|
boolean |
outputNonredundantTokenAttribute
Output only non-redundant token attribute.
|
boolean |
outputOriginalToken
Output original token.
|
java.lang.String |
outputOriginalTokenAttribute
Output original token attribute.
|
boolean |
outputPartOfSpeech
Output part of speech tag.
|
java.lang.String |
outputPartOfSpeechAttribute
Output part of speech tag attribute.
|
boolean |
outputPseudoPageBoundaryMilestones
Output pseudo-page boundary milestones.
|
java.lang.String |
outputRightKWICAttribute
Output right KWIC index attribute.
|
boolean |
outputRunningWordNumbers
Output running word number.
|
boolean |
outputSentenceBoundaryMilestones
Output sentence boundary milestones.
|
boolean |
outputSentenceNumber
Output sentence number.
|
java.lang.String |
outputSentenceNumberAttribute
Output sentence number attribute.
|
boolean |
outputSpelling
Output (corrected) spelling.
|
java.lang.String |
outputSpellingAttribute
Output spelling attribute.
|
boolean |
outputStandardSpelling
Output standard spelling.
|
java.lang.String |
outputStandardSpellingAttribute
Output standard spelling attribute.
|
boolean |
outputWhitespaceElements
Output whitespace elements.
|
boolean |
outputWordNumber
Output word number.
|
java.lang.String |
outputWordNumberAttribute
Output word number attribute.
|
boolean |
outputWordOrdinal
Output word ordinal.
|
java.lang.String |
outputWordOrdinalAttribute
Output word ordinal attribute.
|
java.lang.String |
programBanner
The program banner (title and version number)
|
java.lang.String |
programTitle
The program name.
|
java.lang.String |
programVersion
The program version.
|
UTF8Properties |
properties
MorphAdorner configuration properties.
|
java.net.URL |
propertiesURL |
protected java.lang.String |
propertiesURLString |
java.lang.String |
pseudoPageContainerDivTypes
List of pseudo-page ending div types.
|
int |
pseudoPageSize
Pseudo-page size.
|
protected java.util.ResourceBundle |
resourceBundle
The resource strings.
|
protected static java.lang.String |
resourceName
Resource bundle path.
|
java.net.URL |
spellingsURL |
protected java.lang.String |
spellingsURLString
Standard spellings URL.
|
protected static java.lang.String |
STR_ADORN_EXISTING_XML_FILES |
protected static java.lang.String |
STR_CLOSE_SENTENCE_AT_END_OF_HARD_TAG |
protected static java.lang.String |
STR_CLOSE_SENTENCE_AT_END_OF_JUMP_TAG |
protected static java.lang.String |
STR_CONTEXT_RULES |
protected static java.lang.String |
STR_DISALLOW_WORD_ELEMENTS_IN |
protected static java.lang.String |
STR_DOCTYPE_NAME |
protected static java.lang.String |
STR_DOCTYPE_SYSTEM |
protected static java.lang.String |
STR_ENTITIES_MERGE |
protected static java.lang.String |
STR_ENTITIES_NOT_FILES |
protected static java.lang.String |
STR_ENTITIES_TREAT_ALL |
protected static java.lang.String |
STR_FIELD_DELIMITERS |
protected static java.lang.String |
STR_FIX_GAP_TAGS |
protected static java.lang.String |
STR_FIX_ORIG_TAGS |
protected static java.lang.String |
STR_FIX_SPLIT_WORDS |
protected static java.lang.String |
STR_ID
XGTagger configuration item names.
|
protected static java.lang.String |
STR_ID_SPACING |
protected static java.lang.String |
STR_ID_TYPE |
protected static java.lang.String |
STR_IGNORE_TAG_CASE |
protected static java.lang.String |
STR_JUMP_TAGS |
protected static java.lang.String |
STR_LEXICAL_RULES |
protected static java.lang.String |
STR_LOG |
protected static java.lang.String |
STR_OUTPUT_FILE |
protected static java.lang.String |
STR_OUTPUT_NONREDUNDANT_ATTRIBUTES_ONLY |
protected static java.lang.String |
STR_OUTPUT_NONREDUNDANT_EOS_ATTRIBUTE |
protected static java.lang.String |
STR_OUTPUT_NONREDUNDANT_PART_ATTRIBUTE |
protected static java.lang.String |
STR_OUTPUT_NONREDUNDANT_TOKEN_ATTRIBUTE |
protected static java.lang.String |
STR_OUTPUT_PSEUDO_PAGE_BOUNDARY_MILESTONES |
protected static java.lang.String |
STR_OUTPUT_SENTENCE_BOUNDARY_MILESTONES |
protected static java.lang.String |
STR_OUTPUT_WHITESPACE_ELEMENTS |
protected static java.lang.String |
STR_PSEUDO_PAGE_CONTAINER_DIV_TYPES |
protected static java.lang.String |
STR_PSEUDO_PAGE_SIZE |
protected static java.lang.String |
STR_PUNC_TAG_NAME |
protected static java.lang.String |
STR_RELATIVE_URI_BASE |
protected static java.lang.String |
STR_REPEAT_ATTRIBUTES |
protected static java.lang.String |
STR_SOFT_TAGS |
protected static java.lang.String |
STR_SPECIAL_SEPARATOR |
protected static java.lang.String |
STR_SPELLING_PAIRS |
protected static java.lang.String |
STR_SPELLING_PAIRS_BY_WORD_CLASS |
protected static java.lang.String |
STR_STANDARD_SPELLINGS |
protected static java.lang.String |
STR_SUFFIX_LEXICON |
protected static java.lang.String |
STR_SURROUND_MARKER |
protected static java.lang.String |
STR_TAGS_PATH |
protected static java.lang.String |
STR_TOKENLABEL_ATTRIBUTE |
protected static java.lang.String |
STR_TOKENLABEL_EMIT |
protected static java.lang.String |
STR_TOKENLABEL_PREPENDWORKNAME |
protected static java.lang.String |
STR_TOKENLABEL_SPACING |
protected static java.lang.String |
STR_TRANSITION_MATRIX |
protected static java.lang.String |
STR_USE_PC_TO_MARK_END_OF_SENTENCE |
protected static java.lang.String |
STR_WORD_DELIMITERS |
protected static java.lang.String |
STR_WORD_FIELD |
protected static java.lang.String |
STR_WORD_LEXICON |
protected static java.lang.String |
STR_WORD_PATH |
protected static java.lang.String |
STR_WORD_TAG_NAME |
protected static java.lang.String |
STR_XMLSCHEMA |
java.net.URL |
suffixLexiconURL |
protected java.lang.String |
suffixLexiconURLString
Suffix lexicon URL.
|
boolean |
tokenizeOnly
Tokenize only: Override other output than tokenization for XML.
|
java.net.URL |
transitionMatrixURL |
protected java.lang.String |
transitionMatrixURLString
Transition matrix URL.
|
boolean |
tryStandardSpellings
Try standard spellings when guessing parts of speech.
|
boolean |
useLatinWordList
Use Latin word list.
|
boolean |
usePCToMarkEndOfSentence
Use PC element to mark end of sentence.
|
boolean |
useXMLHandler
Use XGTagger-based XML handler.
|
java.net.URL |
wordLexiconURL |
protected java.lang.String |
wordLexiconURLString
Word lexicon URL.
|
XGOptions |
xgOptions
XGTagger configuration properties.
|
java.lang.String |
xmlDoctypeName
XML doctype name for output.
|
java.lang.String |
xmlDoctypeSystem
XML doctype system (DTD) for output.
|
int |
xmlIDSpacing
XML ID spacing.
|
MorphAdornerSettings.XMLIDType |
xmlIDType
XML ID Type.
|
java.lang.String |
xmlSchema
XML schema to use when parsing XML input files.
|
java.lang.String |
xmlTokenLabelAttribute
XML token label attribute.
|
boolean |
xmlTokenLabelEmit
Emit XML token label?
|
boolean |
xmlTokenLabelPrependWorkName
XML token label prepend work name.
|
int |
xmlTokenLabelSpacing
XML token label spacing.
|
java.util.List<java.lang.String> |
xmlWordAttributes
XML word attributes.
|
Constructor and Description |
---|
MorphAdornerSettings()
Create MorphAdorner settings.
|
Modifier and Type | Method and Description |
---|---|
protected int |
entityReferenceHandling()
Get XGTagger entity reference handling.
|
protected boolean |
getBooleanProperty(java.lang.String name,
boolean defaultValue)
Get a boolean configuration property.
|
protected java.lang.String |
getBooleanStringProperty(java.lang.String name,
java.lang.String defaultValue)
Get a boolean string configuration property.
|
protected void |
getCommandLineParameters(java.lang.String[] args)
Get command line parameters.
|
protected int |
getIntegerProperty(java.lang.String name,
int defaultValue)
Get an integer configuration property.
|
java.lang.String |
getMorphAdornerVersion()
Return MorphAdorner program version.
|
void |
getOptions()
Get options from properties file.
|
MorphAdornerSettings |
getSettings()
Return settings.
|
void |
getSettings(java.lang.String[] args)
Get program settings.
|
java.lang.String |
getString(java.lang.String resourceName)
Get string from ResourceBundle.
|
java.lang.String |
getString(java.lang.String resourceName,
java.lang.String defaultValue)
Get string from ResourceBundle.
|
protected java.lang.String |
getStringProperty(java.lang.String name,
java.lang.String defaultValue)
Get a string configuration property.
|
java.lang.String[] |
getStrings(java.lang.String resourceName,
java.lang.String[] defaults)
Parse ResourceBundle for a String array.
|
java.lang.String |
getXMLWordAttribute(int attrIndex)
Get XML word attribute.
|
java.util.List<java.lang.String> |
getXMLWordAttributes()
Get XML word attributes.
|
protected static void |
help()
Prints the help message.
|
void |
initializeSettings(MorphAdornerLogger morphAdornerLogger)
Initialize MorphAdorner settings.
|
void |
loadProperties()
Get MorphAdorner properties and add them to the System properties.
|
void |
rectifyOptions()
Rectify options.
|
protected int |
setDelimiters()
Sets word delimiters.
|
protected int |
setIDs()
Sets the Ids property.
|
protected int |
setLogFileNames()
Sets the log file name.
|
protected int |
setPaths()
Sets the Path property.
|
int |
setXGOptions()
Set XGTagger options.
|
void |
setXMLWordAttributes(boolean outputOriginalToken,
boolean outputLemma,
boolean outputStandardSpelling)
Set word attribute names for XML output.
|
java.lang.String[] |
splitStrings(java.lang.String input)
Split string into a series of substrings on whitespace boundries.
|
java.lang.String |
stripQuotes(java.lang.String strText)
Removes eventual quotation mark around the property.
|
protected java.util.ResourceBundle resourceBundle
protected static java.lang.String resourceName
public java.lang.String programTitle
public java.lang.String programVersion
public java.lang.String programBanner
public MorphAdornerLogger morphAdornerLogger
public UTF8Properties properties
public XGOptions xgOptions
public boolean debug
protected boolean initialized
public java.lang.String outputDirectoryName
protected java.lang.String wordLexiconURLString
public java.net.URL wordLexiconURL
protected java.lang.String suffixLexiconURLString
public java.net.URL suffixLexiconURL
protected java.lang.String contextRulesURLString
public java.net.URL contextRulesURL
protected java.lang.String lexicalRulesURLString
public java.net.URL lexicalRulesURL
protected java.lang.String spellingsURLString
public java.net.URL spellingsURL
protected java.lang.String[] alternateSpellingsURLStrings
public java.net.URL[] alternateSpellingsURLs
protected java.lang.String[] alternateSpellingsByWordClassURLStrings
public java.net.URL[] alternateSpellingsByWordClassURLs
protected java.lang.String transitionMatrixURLString
public java.net.URL transitionMatrixURL
protected java.lang.String defaultPropertiesURLString
protected java.lang.String propertiesURLString
public java.net.URL propertiesURL
public java.lang.String[] fileNames
public boolean outputSentenceNumber
public java.lang.String outputSentenceNumberAttribute
public java.lang.String outputWordOrdinalAttribute
public boolean outputWordOrdinal
public boolean outputWordNumber
public java.lang.String outputWordNumberAttribute
public boolean outputRunningWordNumbers
public boolean outputSpelling
public java.lang.String outputSpellingAttribute
public boolean outputOriginalToken
public java.lang.String outputOriginalTokenAttribute
public boolean outputPartOfSpeech
public java.lang.String outputPartOfSpeechAttribute
public boolean outputLemma
public java.lang.String outputLemmaAttribute
public boolean outputStandardSpelling
public java.lang.String outputStandardSpellingAttribute
public boolean outputKWIC
public java.lang.String outputLeftKWICAttribute
public java.lang.String outputRightKWICAttribute
public int outputKWICWidth
public boolean outputEOSFlag
public java.lang.String outputEOSFlagAttribute
public java.lang.String xmlDoctypeName
public java.lang.String xmlDoctypeSystem
public boolean useXMLHandler
public boolean ignoreLexiconEntriesForLemmatization
public boolean tryStandardSpellings
public boolean useLatinWordList
public boolean outputWhitespaceElements
public boolean outputNonredundantAttributesOnly
public boolean outputNonredundantTokenAttribute
public boolean outputNonredundantPartAttribute
public boolean outputNonredundantEosAttribute
public boolean outputSentenceBoundaryMilestones
public boolean usePCToMarkEndOfSentence
public boolean allowLowerCaseProperNouns
public boolean fixGapTags
public boolean fixOrigTags
public boolean fixSplitWords
public java.util.List<PatternReplacer> fixSplitWordsPatternReplacers
public int pseudoPageSize
public boolean outputPseudoPageBoundaryMilestones
public java.lang.String pseudoPageContainerDivTypes
public boolean closeSentenceAtEndOfHardTag
public boolean closeSentenceAtEndOfJumpTag
public java.lang.String xmlSchema
public java.lang.String disallowWordElementsIn
public MorphAdornerSettings.XMLIDType xmlIDType
public int xmlIDSpacing
public boolean xmlTokenLabelEmit
public java.lang.String xmlTokenLabelAttribute
public int xmlTokenLabelSpacing
public boolean xmlTokenLabelPrependWorkName
public java.util.List<java.lang.String> xmlWordAttributes
public java.lang.String abbreviationsURL
public java.lang.String abbreviationsMainTextURL
public java.lang.String abbreviationsSideTextURL
public boolean adornExistingXMLFiles
public boolean tokenizeOnly
protected static final java.lang.String STR_ID
protected static final java.lang.String STR_ID_TYPE
protected static final java.lang.String STR_ID_SPACING
protected static final java.lang.String STR_TOKENLABEL_EMIT
protected static final java.lang.String STR_TOKENLABEL_ATTRIBUTE
protected static final java.lang.String STR_TOKENLABEL_SPACING
protected static final java.lang.String STR_TOKENLABEL_PREPENDWORKNAME
protected static final java.lang.String STR_LOG
protected static final java.lang.String STR_WORD_PATH
protected static final java.lang.String STR_TAGS_PATH
protected static final java.lang.String STR_FIELD_DELIMITERS
protected static final java.lang.String STR_WORD_DELIMITERS
protected static final java.lang.String STR_SURROUND_MARKER
protected static final java.lang.String STR_WORD_FIELD
protected static final java.lang.String STR_OUTPUT_FILE
protected static final java.lang.String STR_ENTITIES_NOT_FILES
protected static final java.lang.String STR_ENTITIES_TREAT_ALL
protected static final java.lang.String STR_ENTITIES_MERGE
protected static final java.lang.String STR_RELATIVE_URI_BASE
protected static final java.lang.String STR_REPEAT_ATTRIBUTES
protected static final java.lang.String STR_JUMP_TAGS
protected static final java.lang.String STR_SOFT_TAGS
protected static final java.lang.String STR_PUNC_TAG_NAME
protected static final java.lang.String STR_WORD_TAG_NAME
protected static final java.lang.String STR_SPECIAL_SEPARATOR
protected static final java.lang.String STR_IGNORE_TAG_CASE
protected static final java.lang.String STR_DOCTYPE_NAME
protected static final java.lang.String STR_DOCTYPE_SYSTEM
protected static final java.lang.String STR_OUTPUT_WHITESPACE_ELEMENTS
protected static final java.lang.String STR_OUTPUT_NONREDUNDANT_ATTRIBUTES_ONLY
protected static final java.lang.String STR_OUTPUT_NONREDUNDANT_TOKEN_ATTRIBUTE
protected static final java.lang.String STR_OUTPUT_NONREDUNDANT_PART_ATTRIBUTE
protected static final java.lang.String STR_OUTPUT_NONREDUNDANT_EOS_ATTRIBUTE
protected static final java.lang.String STR_OUTPUT_SENTENCE_BOUNDARY_MILESTONES
protected static final java.lang.String STR_USE_PC_TO_MARK_END_OF_SENTENCE
protected static final java.lang.String STR_FIX_GAP_TAGS
protected static final java.lang.String STR_FIX_ORIG_TAGS
protected static final java.lang.String STR_FIX_SPLIT_WORDS
protected static final java.lang.String STR_PSEUDO_PAGE_SIZE
protected static final java.lang.String STR_OUTPUT_PSEUDO_PAGE_BOUNDARY_MILESTONES
protected static final java.lang.String STR_PSEUDO_PAGE_CONTAINER_DIV_TYPES
protected static final java.lang.String STR_CLOSE_SENTENCE_AT_END_OF_HARD_TAG
protected static final java.lang.String STR_CLOSE_SENTENCE_AT_END_OF_JUMP_TAG
protected static final java.lang.String STR_XMLSCHEMA
protected static final java.lang.String STR_WORD_LEXICON
protected static final java.lang.String STR_SUFFIX_LEXICON
protected static final java.lang.String STR_CONTEXT_RULES
protected static final java.lang.String STR_LEXICAL_RULES
protected static final java.lang.String STR_STANDARD_SPELLINGS
protected static final java.lang.String STR_SPELLING_PAIRS
protected static final java.lang.String STR_SPELLING_PAIRS_BY_WORD_CLASS
protected static final java.lang.String STR_TRANSITION_MATRIX
protected static final java.lang.String STR_DISALLOW_WORD_ELEMENTS_IN
protected static final java.lang.String STR_ADORN_EXISTING_XML_FILES
public void initializeSettings(MorphAdornerLogger morphAdornerLogger)
morphAdornerLogger
- MorphAdorner logger.public java.lang.String getString(java.lang.String resourceName, java.lang.String defaultValue)
resourceName
- Name of resource to retrieve.defaultValue
- Default value for resource.Underline "_" characters are replaced by spaces.
public java.lang.String getString(java.lang.String resourceName)
resourceName
- Name of resource to retrieve.Underline "_" characters are replaced by spaces.
public java.lang.String[] getStrings(java.lang.String resourceName, java.lang.String[] defaults)
resourceName
- Name of resource.defaults
- Array of default string values.public java.lang.String[] splitStrings(java.lang.String input)
input
- Input string.This is useful for retrieving an array of strings from the resource file. Underline "_" characters are replaced by spaces.
protected void getCommandLineParameters(java.lang.String[] args)
args
- Command line parameters.public void getSettings(java.lang.String[] args) throws java.lang.Exception
args
- Command line parameters.java.lang.Exception
public void loadProperties()
protected static void help()
public void getOptions() throws java.lang.Exception
java.lang.Exception
public void rectifyOptions() throws java.lang.Exception
java.lang.Exception
public int setXGOptions() throws java.io.IOException
java.io.IOException
protected int entityReferenceHandling()
protected int setDelimiters()
0
.protected int setPaths() throws java.io.IOException
0
whether the Path property has
been correctly specified,
-1
otherwise.java.io.IOException
protected int setIDs() throws java.io.IOException
0
whether the Ids property has
been correctly specified, -1
otherwise.java.io.IOException
protected int setLogFileNames() throws java.io.IOException
0
.
If the log file name has not been specified (but a log asked),
the file name is input_file.log
.
java.io.IOException
public java.lang.String stripQuotes(java.lang.String strText)
strText
- the initial text.protected boolean getBooleanProperty(java.lang.String name, boolean defaultValue)
name
- Property name.defaultValue
- Default value.protected int getIntegerProperty(java.lang.String name, int defaultValue)
name
- Property name.defaultValue
- Default value.protected java.lang.String getBooleanStringProperty(java.lang.String name, java.lang.String defaultValue)
name
- Property name.defaultValue
- Default value.protected java.lang.String getStringProperty(java.lang.String name, java.lang.String defaultValue)
name
- Property name.defaultValue
- Default value.public void setXMLWordAttributes(boolean outputOriginalToken, boolean outputLemma, boolean outputStandardSpelling)
outputOriginalToken
- true to output original token.outputLemma
- true to output lemma.outputStandardSpelling
- true to output standard spelling.public java.lang.String getXMLWordAttribute(int attrIndex)
attrIndex
- Attribute index.public java.util.List<java.lang.String> getXMLWordAttributes()
public MorphAdornerSettings getSettings()
public java.lang.String getMorphAdornerVersion()