Morphological Adorner

MorphAdorner adorns words in text with morphological tags.

See: Description

Package Description
Ngram-based language detection methods for text.
Utilities used by the language detection methods.
SAX-based XML output filters.
Contains the interface for language profiles for the Cybozu Language Detector as well as the profiles themselves.
MorphAdorner adorns texts with word-based morphological information such as parts of speech and lemmata.
Adorned Word.
Tokens which start or end with apostrophes.
Contraction Expander.
Syllable counter.
Text inputter for morphadorner.
Language recognizer.
Lexicon of spelling, lemmata, and parts of speech.
Classes for extracting and manipulating multiword units.
Finds named entities in text.
Name standardizer.
Classes for creating and manipulating word ngrams.
Output generation for adorned text.
Classes and methods for manipulating part of speech tags.
Classes and methods for mapping one part of speech tag set to another.
Classes for generating phonetic values for strings.
Methods and interfaces for part of speech tagging and lemmatization.
Affix part of speech tagger.
All unknown part of speech tagger.
Bigram part of speech tagger.
Hybrid bigram part of speech tagger.
Guesses parts of speech for unknown words.
Implements Mark Hepple's part of speech tagger.
Implements tagging rules for Mark Hepple's part of speech tagger.
Retagger to correct "I" tagging issues.
Retagger which leaves initial tagging undisturbed.
Retagger to correct proper noun tagging issues.
Regular expression-based part of speech tagger.
Simple part of speech tagger.
Simple rule-based part of speech tagger.
Methods and interfaces for lexical and contextual smoothing for part of speech taggers.
Suffix part of speech tagger.
Retagger to correct TCP text issues.
Transition matrix.
Trigram part of speech tagger.
Hybrid trigram part of speech tagger.
Unigram part of speech tagger.
Melds a list of words and punctuation into formatted sentences.
Splits text into sentences.
BritishToUS is a simple filter which maps British spellings to American (US) spellings.
Spelling standardization.
Methods and interfaces for statistical methods useful in corpus linguistics.
Stop words.
Methods for computing the similarity of strings.
Syllable counter.
Text Segmentation.
C99 text segmentation.
Utilities for linear text segmentation.
Text Tiling text segmentation.
Text Summarization.
Text tokenization.
Word Counts.
Example programs using MorphAdorner facilities.
GATE interfaces for MorphAdorner components.
Utility classes for TEI XML processing.
Contains a variety of utility tools for creating and manipulating data files for use with MorphAdorner.
Create derived MorphAdorner files with character offsets to word tokens.
Adds pseudopage milestones to an adorned file.
AdornedToSimpleTEIP5 converts a base-level MorphAdorner file to a more TEI P5-like format.

AdornedToSketch converts one or more adorned files to the verticalized input required by the Sketch or NoSketch corpus search engines.

AdornedToTCF04 converts one or more adorned files to the Text Corpus Format (TCF) v0.4 used by the CLARIN-D project.
Utilities for merging Annolex generated corrections with adorned XML files.
Applies XSLT transformation to one or more files.
Classes and utilities for comparing token streams in adorned files and logging the differences to XML format files.
Compare string counts in two files using Dunning's log-likelihood.
Counts adorned words by processing XMLToTab output.
Counts affixes (suffixes and prefixes) of adorned words by processing MorphAdorner XML output.
Generates a MorphAdorner lexicon from training data.
Generates a MorphAdorner suffix lexicon from a word lexicon.
Determines the language(s) in which a TEI text is written.
Fix quote marks in text and XML files.
Link grammar parser driver.
Merges Brill style lexicon with MorphAdorned lexicon.
Merges enhanced Brill style lexicon with MorphAdorned lexicon.
Merges multiple spelling map word lists into a single file.
Merges multiple text files into a single file.
Merges multiple word list files into a single file.
AdornWithNamedEntities adorns texts with named entities such as person, location, time, date, and organization.
PunktAbbreviationDetector uses the Punkt algorithm of Kiss and Strunk to decide whether a token containing one or more periods is an abbreviation.
Update lemmata and standard spellings in MorphAdorned XML files.
Utilities to extract random or exact size samples from a text file.
Create derived MorphAdorner file with word elements stripped of attributes.
Compares training data to adorner output.
Training programs for part of speech taggers.
Generates transition matrices from training data for hidden Markov model part of speech taggers.
The tcp package contains utilities aimed at processing Text Creation Partnership texts.
Unadorn removes word level adornments from adorned files.
Validate XML files.
Utilities to convert MorphAdorned XML files to tab-separated tabular form.
Supervises adornment of XML texts.
Reusable utilities, primarily non-visual.
Cache utilities.
Reading and writing delimiter separated files.
Classes for databases using MySQL.
Utilities for processing HTML text.
Logging utilities.
Reusable utilities for mathematics and arithmetic.
Methods for computing point probabilities and percentage points of common statistical distributions.
Implements the Mersenne Twister random number generator as well as methods for generating random numbers from a variety of statistical distributions.
Methods and interfaces for finding roots (zeroes) of functions.
Reusable utilities for statistics.
MIME utilities.
A java comment-based source preprocessor.
Reusable utilities for servlets.
Provides classes and methods for accessing spelling dictionaries and performing spell checking.
Programs to create spelling dictionaries for use with the spellcheck classes.
Reusable XML utilities.
Reusable JDOM XML utilities.
Jargs GNU Command Line Parser.
JLinkGrammar is a Java port of the Carnergie Mellon University link grammar parser, a syntactic parser for English.