MorphAdorner: Generating Tag Transition Probabilities

Generating Tag Transition Probabilities

NGramTaggerTrainer merges the contents of multiple word list files into a single file. A word list file contains a list of words, one word on each line.

Usage:

ngramtaggertrainer trainingdata.tab wordlexicon.lex transitionmatrix.mat

where

trainingdata.tab -- input training data file.
wordlexicon.lex -- input MorphAdorner lexicon.
transitionmatrix.mat -- output tag transition matrix file.

The training data file is a tab-separated utf-8 file containing the part of speech training data generated from the training texts. We only use the first two columns of the training data.

The original token (spelling).
The NUPOS part of speech.

The word lexicon is a MorphAdorner format word lexicon.

The output tag transition file is a utf-8 file containing the data needed by the MorphAdorner bigram and trigram taggers.

	Home
	Welcome
	Announcements and News
	Announcements and news about changes to MorphAdorner
	Documentation
	Documentation for using MorphAdorner
	Download MorphAdorner
	Downloading and installing the MorphAdorner client and server software
	Glossary
	Glossary of MorphAdorner terms
	Helpful References
	Natural language processing references
	Licenses
	Licenses for MorphAdorner and Associated Software
	Server
	Online examples of MorphAdorner Server facilities.
	Talks
	Slides from talks about MorphAdorner.
	Tech Talk
	Technical information for programmers using MorphAdorner