public class Sentence
extends java.lang.Object
There are other examples, many routines take ParseOptions as an argument and yet use this.opts for their use. If the programmer is not careful odd results may occur.
One last note: Everything is reference via integer indexes into arrays rather than as objects. A natural object oriented approach would pass the Word object and the routine would use Word.id to find the offset in the array. This would improve type checking and overall program safety. - jlr
The most important routine is sentence_parse()
sentence_parse(ParseOptions)
Modifier and Type | Field and Description |
---|---|
AndData |
and_data
used to keep track of fat disjuncts
|
static int[] |
and_element |
static int[] |
and_element_sizes |
static int |
CMS_SIZE |
static Cms[] |
cms_table |
static TableConnector[] |
ctable
The TableConnector table associated with this sentence instance object
|
static int |
ctable_size
The size of this.ctable
TODO - make this Java, not C, and use the collection object stuff!
sp that ctable size can not be modified independently of ctable.
|
boolean[][] |
deletable
deletable regions in a sentence with conjunction
|
Dictionary |
dict
words are defined from this dictionary
|
int[][] |
effective_dist
created by build_effective_dist()
|
static boolean[] |
has_fat_down |
static ImageNode[] |
image_array |
LinkageInfo[] |
link_info
array of valid and invalid linkages (sorted)
|
static int |
match_cost |
static MatchNode[][] |
match_l_table |
static int[] |
match_l_table_size |
static MatchNode[][] |
match_r_table |
static int[] |
match_r_table_size |
static int |
N_and_elements |
static int |
N_changed |
static int |
N_outside_words |
int |
null_count
number of null links in linkages
|
static boolean |
null_links |
int |
num_linkages_alloced
total number of linkages allocated.
|
int |
num_linkages_found
total number linkages before postprocessing.
|
int |
num_linkages_post_processed
The number of linkages that are actually
put into the array that was alloced.
|
int |
num_valid_linkages
number with no pp violations
|
static int[] |
outside_word |
ParseInfo |
parse_info
set of parses for the sentence
|
PatchElement[] |
patch_array |
int[] |
post_quote |
static int |
power_cost |
static CList[][] |
power_l_table |
static int[] |
power_l_table_size |
static int |
power_prune_mode |
static CList[][] |
power_r_table |
static int[] |
power_r_table_size |
boolean |
q_pruned_rules
don't prune rules more than once in p.p.
|
static int |
s_table_size |
static boolean |
structure_violation |
static Connector[] |
table |
static boolean[] |
visited |
java.util.ArrayList<Word> |
word
array of words after tokenization
|
Constructor and Description |
---|
Sentence(java.lang.String input_string,
Dictionary dict,
ParseOptions opts) |
Modifier and Type | Method and Description |
---|---|
static Disjunct |
add_one_connector(int label,
int dir,
java.lang.String cs,
Disjunct d)
This adds one connector onto the beginning of the left (or right)
connector list of d.
|
static MatchNode |
add_to_left_table_list(MatchNode m,
MatchNode l)
Adds the match node m to the sorted list of match nodes l.
|
static MatchNode |
add_to_right_table_list(MatchNode m,
MatchNode l)
Adds the match node m to the sorted list of match nodes l.
|
LinkageInfo |
analyze_fat_linkage(ParseOptions opts,
int analyze_pass)
This uses link_array.
|
LinkageInfo |
analyze_thin_linkage(ParseOptions opts,
int analyze_pass)
This uses link_array.
|
Disjunct |
build_AND_disjunct_list(java.lang.String s)
Builds and returns a disjunct list for "and", "or" and "nor"
for each disjunct in the label_table, we build three disjuncts
this means that "Danny and Tycho and Billy" will be parsable in
two ways.
|
AndList |
build_andlist()
This function computes the "and cost", resulting from inequalities in the word.size() of
and-list elements.
|
(package private) Disjunct |
build_COMMA_disjunct_list() |
void |
build_conjunction_tables() |
void |
build_deletable(boolean has_conjunction)
Initialize the array deletable[i][j] to indicate if the words
i+1...j-1 could be non existant in one of the multiple linkages.
|
Disjunct |
build_disjuncts_for_XNode(ParseOptions opts,
XNode x,
int cost_cutoff) |
void |
build_effective_dist(boolean has_conjunction) |
(package private) Disjunct |
build_fat_link_substitutions(Disjunct d) |
void |
build_image_array() |
boolean |
build_parse_set(int cost,
ParseOptions opts)
This is the top level call that computes the whole parse_set.
|
void |
build_sentence_disjuncts(ParseOptions opts,
int cost_cutoff)
We've already built the sentence expressions.
|
void |
build_sentence_expressions(ParseOptions opts)
Corrects case of first word, fills in other proper nouns, and
builds the expression lists for the resulting words.
|
void |
clean_table(int size,
CList[] t)
This runs through all the connectors in this table, and eliminates those
who are obsolete.
|
void |
clean_up_expressions(int w)
This removes the expressions that are empty from the list corresponding
to word w of the sentence.
|
void |
clean_up(int w)
Step three of the sentence_parse operation - pruning
|
int |
cms_hash(java.lang.String s) |
void |
compute_link_names() |
void |
compute_matchers_for_a_label(int k) |
void |
compute_pp_link_array_connectors(Sublinkage sublinkage)
This takes as input link_array[], sublinkage.link[].l and
sublinkage.link[].r (and also has_fat_down[word], which has been
computed in a prior call to is_canonical()), and from these
computes sublinkage.link[].lc and .rc.
|
void |
compute_pp_link_names(Sublinkage sublinkage)
This fills in the sublinkage.link[].name field.
|
boolean |
conj_in_range(int lw,
int rw)
Determin if there is a conjunction between the suppled right and
left words.
|
void |
conjunction_prune(ParseOptions opts)
We've already built the sentence disjuncts, and we've pruned them
and power_pruned(GENTLE) them also.
|
void |
connector_for_disjunct(Disjunct d,
Connector c) |
static DTypeList |
copy_d_type(DTypeList dtl)
Copy the named Domain Type List and return a copy
|
int |
count_disjuncts_in_sentence() |
int |
count(int lw,
int rw,
Connector le,
Connector re,
int cost,
ParseOptions opts) |
int |
delete_from_cms_table(java.lang.String str) |
void |
delete_unmarked_disjuncts() |
(package private) Disjunct |
explode_disjunct_list(Disjunct d) |
void |
expression_prune(ParseOptions opts) |
void |
extract_all_fat_links(Disjunct d) |
static int |
fast_match_hash(Connector c)
This hash function only looks at the leading upper case letters of
the connector string, and the label fields.
|
void |
fill_patch_array_CON(CONNode cn,
LinksToPatch ltp) |
void |
fill_patch_array_DIS(DISNode dn,
LinksToPatch ltp)
Patches up appropriate links in the patch_array for this DISNode
and this patch list.
|
Disjunct |
find_subdisjunct(Disjunct dis,
int label)
Find the specific disjunct of in label_table[label]
which corresponds to dis.
|
static MatchNode |
form_match_list(int w,
Connector lc,
int lw,
Connector rc,
int rw)
Forms and returns a list of disjuncts that might match lc or rc or both.
|
void |
free_AND_tables() |
void |
free_HT() |
void |
free_LT() |
void |
free_parse_set() |
static void |
free_S() |
void |
free_sentence_disjuncts() |
(package private) static MatchNode |
get_match_node() |
static Disjunct |
glom_aux_connector(Disjunct d,
int label,
boolean necessary)
In this case the connector is to connect to the "either", "neither",
"not", or some auxilliary d to the current which is a conjunction.
|
static Disjunct |
glom_comma_connector(Disjunct d)
This file contains the functions for massaging disjuncts of the
sentence in special ways having to do with conjunctions.
|
(package private) void |
grow_LT() |
static int |
hash_S(Connector c)
This hash function only looks at the leading upper case letters of
the connector string, and the label fields.
|
static int |
hash(int lw,
int rw,
Connector le,
Connector re,
int cost) |
void |
init_cms_table() |
void |
init_fast_matcher() |
void |
init_HT() |
void |
init_LT() |
void |
init_power()
allocates and builds the initial power hash tables
|
void |
init_table()
A piecewise exponential function determines the size of the hash table.
|
void |
init_x_table()
A piecewise exponential function determines the size of the hash table.
|
void |
initialize_conjunction_tables() |
void |
insert_in_cms_table(java.lang.String str) |
static void |
insert_S(Connector c) |
(package private) void |
install_fat_connectors() |
void |
install_special_conjunctive_connectors() |
static java.lang.String |
intersect_strings(java.lang.String s,
java.lang.String t) |
boolean |
is_appropriate(Disjunct d)
returns true if the disjunct is appropriate to be made into fat links.
|
boolean |
is_canonical_linkage()
uses link_array[], chosen_disjuncts[], has_fat_down[].
|
int |
left_connector_count(Disjunct d)
returns the number of connectors in the left lists of the disjuncts.
|
int |
left_connector_list_update(Connector c,
int word_c,
int w,
boolean shallow)
take this connector list, and try to match it with the words
w-1, w-2, w-3...Returns the word to which the first connector of the
list could possibly be matched.
|
static int |
left_disjunct_list_length(Disjunct d) |
boolean |
left_table_search(int w,
Connector c,
boolean shallow,
int word_c) |
Cms |
lookup_in_cms_table(java.lang.String str) |
(package private) void |
mark_region(int lw,
int rw,
Connector le,
Connector re) |
boolean |
match_in_cms_table(java.lang.String pp_match_name) |
boolean |
matches_S(Connector c,
int dir)
returns true if c can match anything in the set S
because of the horrible kludge, prune match is assymetric, and
direction is '-' if this is an l.r pass, and '+' if an r.l pass.
|
int |
parse(int cost,
ParseOptions opts)
Returns the number of null links the sentence can be parsed with the
specified cost Assumes that the hash table this.ctable has already been
initialized, and is freed later.
|
boolean |
possible_connection(Connector lc,
Connector rc,
boolean lshallow,
boolean rshallow,
int lword,
int rword)
this takes two connectors (and whether these are shallow or not)
(and the two words that these came from) and returns true if it is
possible for these two to match based on local considerations.
|
void |
post_process_linkages(ParseOptions opts)
This is another top level call.
|
void |
post_process_scan_linkage(Postprocessor pp,
ParseOptions opts,
Sublinkage sublinkage)
During a first pass (prior to actual post-processing of the linkages
of a sentence), call this once for every generated linkage.
|
PPNode |
post_process(Postprocessor pp,
ParseOptions opts,
Sublinkage sublinkage,
boolean cleanup)
Takes a sublinkage and returns:
.
|
int |
power_hash(Connector c)
This hash function only looks at the leading upper case letters of
the connector string, and the label fields.
|
int |
power_prune(int mode,
ParseOptions opts)
Here is what you've been waiting for: POWER-PRUNE
|
void |
pp_and_power_prune(int mode,
ParseOptions opts) |
int |
pp_prune(ParseOptions opts) |
void |
prepare_to_parse(ParseOptions opts)
assumes that the sentence expression lists have been generated
this does all the necessary pruning and building of and
structures.
|
void |
print_AND_statistics(ParseOptions opts) |
void |
print_disjunct_counts(ParseOptions opts) |
void |
print_expression_sizes(ParseOptions opts) |
void |
print_parse_statistics(ParseOptions opts) |
void |
prune_irrelevant_rules(ParseOptions opts,
Postprocessor pp)
call this (a) after having called post_process_scan_linkage() on all
generated linkages, but (b) before calling post_process() on any
particular linkage.
|
void |
prune(ParseOptions opts) |
int |
pseudocount(int lw,
int rw,
Connector le,
Connector re,
int cost) |
void |
put_disjunct_into_table(Disjunct d) |
static void |
put_into_match_table(int size,
MatchNode[] t,
Disjunct d,
Connector c,
int dir)
The disjunct d (whose left or right pointer points to c) is put
into the appropriate hash table
|
void |
put_into_power_table(int size,
CList[] t,
Connector c,
boolean shal)
The disjunct d (whose left or right pointer points to c) is put
into the appropriate hash table
|
int |
region_valid(int lw,
int rw,
Connector le,
Connector re)
CONJUNCTION PRUNING.
|
int |
right_connector_count(Disjunct d)
returns the number of connectors in the right lists of the disjuncts.
|
int |
right_connector_list_update(Connector c,
int word_c,
int w,
boolean shallow)
take this connector list, and try to match it with the words
w+1, w+2, w+3...Returns the word to which the first connector of the
list could possibly be matched.
|
static int |
right_disjunct_list_length(Disjunct d)
the number of disjuncts in the list that have non-null
right connector lists
|
boolean |
right_table_search(int w,
Connector c,
boolean shallow,
int word_c) |
boolean |
rule_satisfiable(PPLinkset ls) |
boolean |
sentence_contains_conjunction()
We've already built the sentence expressions.
|
boolean |
sentence_contains(java.lang.String s) |
int |
sentence_disjunct_cost(int i) |
java.lang.String |
sentence_get_word(int index) |
int |
sentence_length()
get sentence word.size() in words
|
int |
sentence_null_count() |
int |
sentence_num_linkages_found() |
int |
sentence_num_linkages_post_processed() |
int |
sentence_num_valid_linkages() |
int |
sentence_num_violations(int i) |
int |
sentence_parse(ParseOptions opts)
Step three in parsing a sentence.
|
boolean |
separate_sentence(java.lang.String s,
ParseOptions opts)
The string s has just been read in from standard input.
|
int |
set_dist_fields(Connector c,
int w,
int delta) |
boolean |
set_has_fat_down() |
void |
set_is_conjunction()
How is the is_conjunction table initialized?
TODO - Remove English dependancy
Also what about "yet", "however", "then", "else", "whence", "thus", ...
|
int |
size_of_sentence_expressions()
Computes and returns the number of connectors in all of the expressions
of the sentence.
|
int |
size() |
static Disjunct |
special_disjunct(int label,
int dir,
java.lang.String cs,
java.lang.String ds)
Builds a new disjunct with one connector pointing in direction dir
(which is '+' or '-').
|
void |
stick_in_one_connector(java.lang.StringBuffer s,
Connector c,
int len) |
static boolean |
strictly_smaller_name(java.lang.String s,
java.lang.String t) |
boolean |
strictly_smaller(java.lang.String s,
java.lang.String t) |
static int |
table_lookup(int lw,
int rw,
Connector le,
Connector re,
int cost) |
static TableConnector |
table_pointer(int lw,
int rw,
Connector le,
Connector re,
int cost) |
static TableConnector |
table_store(int lw,
int rw,
Connector le,
Connector re,
int cost,
int count)
Stores the value in the table this.ctable.
|
(package private) void |
table_update(int lw,
int rw,
Connector le,
Connector re,
int cost,
int count) |
static void |
zero_S() |
public Dictionary dict
public java.util.ArrayList<Word> word
public boolean[][] deletable
public int[][] effective_dist
build_effective_dist(boolean)
public int num_linkages_found
public int num_linkages_alloced
public int num_linkages_post_processed
public int num_valid_linkages
public int null_count
public ParseInfo parse_info
public LinkageInfo[] link_info
public AndData and_data
public boolean q_pruned_rules
public int[] post_quote
public PatchElement[] patch_array
public static boolean null_links
public static int ctable_size
public static TableConnector[] ctable
public static int match_cost
public static int[] match_l_table_size
public static int[] match_r_table_size
public static MatchNode[][] match_l_table
public static MatchNode[][] match_r_table
public static boolean structure_violation
public static boolean[] visited
public static int[] and_element_sizes
public static int[] and_element
public static int N_and_elements
public static int[] outside_word
public static int N_outside_words
public static boolean[] has_fat_down
public static ImageNode[] image_array
public static int s_table_size
public static Connector[] table
public static int power_cost
public static int power_prune_mode
public static int N_changed
public static int[] power_l_table_size
public static int[] power_r_table_size
public static CList[][] power_l_table
public static CList[][] power_r_table
public static final int CMS_SIZE
public static Cms[] cms_table
public Sentence(java.lang.String input_string, Dictionary dict, ParseOptions opts)
public boolean separate_sentence(java.lang.String s, ParseOptions opts)
s
- sentence in String formopts
- passes ParseOptions - In reality these are often kept in global variables. TODO - clean up codeParseOptions
public void build_sentence_expressions(ParseOptions opts)
Algorithm:
Here's a summary of how subscripts are handled:
Reading the dictionary:
If the last "." in a string is followed by a non-digit character, then the "." and everything after it is considered to be the subscript of the word.
The dictionary reader does not allow you to have two words that match according to the criterion below. (so you can't have "dog.n" and "dog")
Quote marks are used to allow you to define words in the dictionary which would otherwise be considered part of the dictionary, as in
";": {@Xca-} & Xx- & (W+ or Qd+) & {Xx+};
"%" : (ND- & {DD-} & \
Rules for chopping words from the input sentence:
First the prefix chars are stripped off of the word. These
characters are "(" and "$" (and now "``")
Now, repeat the following as long as necessary:
Look up the word in the dictionary.
If it's there, the process terminates.
If it's not there and it ends in one of the right strippable
strings (see "right_strip") then remove the strippable string
and make it into a separate word.
If there is no strippable string, then the process terminates.
Rule for defining subscripts in input words:
The subscript rule is followed just as when reading the dictionary.
When does a word in the sentence match a word in the dictionary?
Matching is done as follows: Two words with subscripts must match
exactly. If neither has a subscript they must match exactly. If one
does and one doesn't then they must match when the subscript is
removed. Notice that this is symmetric.
So, under this system, the dictonary could have the words "Ill" and
also the word "Ill." It could also have the word "i.e.", which could be
used in a sentence.
opts
- - not used everything comes from GlobalBean. TODO - Fix or droppublic int size()
public void initialize_conjunction_tables()
AndData
public void set_is_conjunction()
public int sentence_length()
public void prepare_to_parse(ParseOptions opts)
opts
- parsing optionspublic void conjunction_prune(ParseOptions opts)
opts
- parsing options used to set tolerance for nullspublic int region_valid(int lw, int rw, Connector le, Connector re)
lw
- integer word number of left wallrw
- integer word number of right wallle
- left expressionre
- right expressionpublic static MatchNode form_match_list(int w, Connector lc, int lw, Connector rc, int rw)
w
- array index of word to matchlc
- left Connectorlw
- index into word array of left wordrc
- right Connectorrw
- index into word array of right wordstatic MatchNode get_match_node()
Disjunct build_COMMA_disjunct_list()
void install_fat_connectors()
public void connector_for_disjunct(Disjunct d, Connector c)
d
- c
- public Disjunct build_AND_disjunct_list(java.lang.String s)
must accommodate "he and I are good", "Davy and I are good" "Danny and Davy are good", and reject all of these with "is" instead of "are".
The SI connectors must also be modified to accommodate "are John and Dave here", but kill "is John and Dave here"
Then we consider "a cat or a dog is here" vs "a cat or a dog are here" The first seems right, the second seems wrong. I'll stick with this. That is, "or" has the property that if both parts are the same in number, we use that but if they differ, we use plural.
The connectors on "I" must be handled specially. We accept "I or the dogs are here" but reject "I or the dogs is here"
TODO - the code here still does now work "right", rejecting "is John or I invited" and accepting "I or my friend know what happened" The more generous code for "nor" has been used instead
It appears that the "nor" of two things can be either singular or plural.
"neither she nor John likes dogs"
"neither she nor John like dogs"
s
- Connector
public void build_conjunction_tables()
public void compute_matchers_for_a_label(int k)
k
- public void stick_in_one_connector(java.lang.StringBuffer s, Connector c, int len)
s
- c
- len
- public void extract_all_fat_links(Disjunct d)
d
- public void put_disjunct_into_table(Disjunct d)
d
- void grow_LT()
public boolean is_appropriate(Disjunct d)
TODO: move to dict
d
- public void init_HT()
public void init_LT()
public void print_AND_statistics(ParseOptions opts)
opts
- public void build_effective_dist(boolean has_conjunction)
has_conjunction
- public void build_deletable(boolean has_conjunction)
TODO - This is awfully ethnocentric. What about other languages, or words like thus, thence, whence etc. This should be a loadable array!
has_conjunction
- public boolean conj_in_range(int lw, int rw)
lw
- integer index of left wordrw
- integer index of right wordpublic int sentence_parse(ParseOptions opts)
O.K. that may be true of the C code version but in this code a lot of information from ParseOptions is held in GlobalBean.
TODO - Make the dictionary and ParseInfo private to the sentence. Then add getter and setter methods.
opts
- num_linkages_found
public int parse(int cost, ParseOptions opts)
cost
- opts
- public int count(int lw, int rw, Connector le, Connector re, int cost, ParseOptions opts)
lw
- rw
- le
- re
- cost
- opts
- public int pseudocount(int lw, int rw, Connector le, Connector re, int cost)
lw
- rw
- le
- re
- cost
- public void init_x_table()
public void init_table()
public static int table_lookup(int lw, int rw, Connector le, Connector re, int cost)
lw
- rw
- le
- re
- cost
- TableConnector.lw
,
TableConnector.rw
,
TableConnector.le
,
TableConnector.re
,
TableConnector.cost
public static int hash(int lw, int rw, Connector le, Connector re, int cost)
lw
- rw
- le
- re
- cost
- TableConnector.lw
,
TableConnector.rw
,
TableConnector.le
,
TableConnector.re
,
TableConnector.cost
public static TableConnector table_pointer(int lw, int rw, Connector le, Connector re, int cost)
lw
- rw
- le
- re
- cost
- TableConnector.lw
,
TableConnector.rw
,
TableConnector.le
,
TableConnector.re
,
TableConnector.cost
public static TableConnector table_store(int lw, int rw, Connector le, Connector re, int cost, int count)
lw
- rw
- le
- re
- cost
- count
- TableConnector.lw
,
TableConnector.rw
,
TableConnector.le
,
TableConnector.re
,
TableConnector.cost
,
TableConnector.next
,
init_table()
public void init_fast_matcher()
public static int left_disjunct_list_length(Disjunct d)
d
- public static int right_disjunct_list_length(Disjunct d)
d
- public static void put_into_match_table(int size, MatchNode[] t, Disjunct d, Connector c, int dir)
dir = 1, we're putting this into a right table.
dir = -1, we're putting this into a left table.
size
- t
- d
- c
- dir
- public static int fast_match_hash(Connector c)
c
- public static MatchNode add_to_right_table_list(MatchNode m, MatchNode l)
m
- the node to addl
- the node to which we are to add m on the rightpublic static MatchNode add_to_left_table_list(MatchNode m, MatchNode l)
m
- the node to addl
- the node to which we are to add m on the rightpublic boolean build_parse_set(int cost, ParseOptions opts)
It also assumes that count() has been run, and that hash table is filled with the values thus computed. Therefore this function must be structured just like parse() (the main function for count()).
If the number of linkages gets huge, then the counts can overflow. We check if this has happened when verifying the parse set. This routine returns true iff overflowed occurred.
This method modifies this.loca-sent, this.parse_info
cost
- opts
- ParseInfo.verify_set()
,
Word
,
parse_info
public void build_sentence_disjuncts(ParseOptions opts, int cost_cutoff)
public Disjunct build_disjuncts_for_XNode(ParseOptions opts, XNode x, int cost_cutoff)
opts
- unused - refers to this.cost_cutoff that is set from ParseInfo pi
at object creation. TODO - Fix where ParseInfo is kept.x
- is the Word expression list nodecost_cutoff
- Word.x
public boolean sentence_contains_conjunction()
public void print_disjunct_counts(ParseOptions opts)
opts
- public void post_process_linkages(ParseOptions opts)
opts
- Linkage.Linkage(int, Sentence, ParseOptions)
public void fill_patch_array_DIS(DISNode dn, LinksToPatch ltp)
dn
- ltp
- public void fill_patch_array_CON(CONNode cn, LinksToPatch ltp)
cn
- ltp
- public LinkageInfo analyze_fat_linkage(ParseOptions opts, int analyze_pass)
opts
- analyze_pass
- ParseInfo
public void post_process_scan_linkage(Postprocessor pp, ParseOptions opts, Sublinkage sublinkage)
pp
- opts
- sublinkage
- public void prune_irrelevant_rules(ParseOptions opts, Postprocessor pp)
opts
- pp
- public PPNode post_process(Postprocessor pp, ParseOptions opts, Sublinkage sublinkage, boolean cleanup)
NB: sublinkage.link[i].l=-1 means that this connector is to be ignored
pp
- opts
- sublinkage
- cleanup
- public void compute_pp_link_array_connectors(Sublinkage sublinkage)
sublinkage
- public void compute_pp_link_names(Sublinkage sublinkage)
sublinkage
- public static DTypeList copy_d_type(DTypeList dtl)
dtl
- public AndList build_andlist()
for a detailed explanation of And
public LinkageInfo analyze_thin_linkage(ParseOptions opts, int analyze_pass)
The code can be used to generate the "islands" array. For this to work, however, you have to call "build_digraph" first (as in analyze_fat_linkage). and then "free_digraph". For some reason this causes a space leak.
opts
- analyze_pass
- public boolean is_canonical_linkage()
AndData
public boolean strictly_smaller(java.lang.String s, java.lang.String t)
s
- t
- public Disjunct find_subdisjunct(Disjunct dis, int label)
dis
- a disjunct in the label_tablelabel
- lable_table containing a disjunctpublic void build_image_array()
public int size_of_sentence_expressions()
public void clean_up_expressions(int w)
w
- public void expression_prune(ParseOptions opts)
opts
- public void print_expression_sizes(ParseOptions opts)
opts
- public static void zero_S()
public static void free_S()
public static void insert_S(Connector c)
c
- public static int hash_S(Connector c)
c
- public void print_parse_statistics(ParseOptions opts)
opts
- public boolean matches_S(Connector c, int dir)
c
- dir
- public java.lang.String sentence_get_word(int index)
index
- word
public int sentence_null_count()
public int sentence_num_linkages_found()
public int sentence_num_valid_linkages()
public int sentence_num_linkages_post_processed()
public int sentence_num_violations(int i)
public int sentence_disjunct_cost(int i)
public boolean set_has_fat_down()
public void compute_link_names()
public static boolean strictly_smaller_name(java.lang.String s, java.lang.String t)
public static java.lang.String intersect_strings(java.lang.String s, java.lang.String t)
public void free_sentence_disjuncts()
public void free_HT()
public void free_LT()
public void free_AND_tables()
public void free_parse_set()
public void install_special_conjunctive_connectors()
public boolean sentence_contains(java.lang.String s)
public static Disjunct glom_comma_connector(Disjunct d)
It would be nice if this code was written more transparently. In other words, there should be some fairly general functions that manipulate disjuncts, and take words like "neither" etc as input parameters, so as to encapsulate the changes being made for special words. This would not be too hard to do, but it's not a high priority. -DS 3/98
There's a problem with installing "...but...", "not only...but...", and "not...but...", which is that the current comma mechanism will allow a list separated by commas. "Not only John, Mary but Jim came" The best way to prevent this is to make it impossible for the comma to attach to the "but", via some sort of additional subscript on commas.
I can't think of a good way to prevent this.
The following functions all do slightly different variants of the following thing:
Catenate to the disjunct list pointed to by d, a new disjunct list. The new list is formed by copying the old list, and adding the new connector somewhere in the old disjunct, for disjuncts that satisfy certain conditions
public static Disjunct glom_aux_connector(Disjunct d, int label, boolean necessary)
public static Disjunct add_one_connector(int label, int dir, java.lang.String cs, Disjunct d)
public static Disjunct special_disjunct(int label, int dir, java.lang.String cs, java.lang.String ds)
public int pp_prune(ParseOptions opts)
public void pp_and_power_prune(int mode, ParseOptions opts)
public void delete_unmarked_disjuncts()
public void clean_up(int w)
The algorithms in this file prune disjuncts from the disjunct list of the sentence that can be elimininated by a simple checks. The first check works as follows:
A series of passes are made through the sentence, alternating left-to-right and right-to-left. Consider the left-to-right pass (the other is symmetric). A set S of connectors is maintained (initialized to be empty). Now the disjuncts of the current word are processed. If a given disjunct's left pointing connectors have the property that at least one of them has no connector in S to which it can be matched, then that disjunct is deleted. Now the set S is augmented by the right connectors of the remaining disjuncts of that word. This completes one word. The process continues through the words from left to right. Alternate passes are made until no disjunct is deleted.
It worries me a little that if there are some really huge disjuncts lists, then this process will probably do nothing. (This fear turns out to be unfounded.)
Notes: Power pruning will not work if applied before generating the "and" disjuncts. This is because certain of it's tricks don't work. Think about this, and finish this note later.... Also, currently I use the standard connector match procedure instead of the pruning one, since I know power pruning will not be used before and generation. Replace this to allow power pruning to work before generating and disjuncts.
Currently it seems that normal pruning, power pruning, and generation, pruning, and power pruning (after "and" generation) and parsing take about the same amount of time. This is why doing power pruning before "and" generation might be a very good idea.
New idea: Suppose all the disjuncts of a word have a connector of type c pointing to the right. And further, suppose that there is exactly one word to its right containing that type of connector pointing to the left. Then all the other disjuncts on the latter word can be deleted. (This situation is created by the processing of "either...or", and by the extra disjuncts added to a "," neighboring a conjunction.)
see AndData()
public int count_disjuncts_in_sentence()
public int power_prune(int mode, ParseOptions opts)
The kinds of constraints it checks for are the following:
1) successive connectors on the same disjunct have to go to nearer and nearer words.
2) two deep connectors cannot attach to eachother (A connectors is deep if it is not the first in its list, it is shallow if it is the first in its list, it is deepest if it is the last on its list.)
3) on two adjacent words, a pair of connectors can be used only if they're the deepest ones on their disjuncts
4) on two non-adjacent words, a pair of connectors can be used only if not [both of them are the deepest].
The data structure consists of a pair of hash tables on every word. Each bucket of a hash table has a list of pointers to connectors. These nodes also store if the chosen connector is shallow.
As with normal pruning, we make alternate left.right and right.left passes. In the R.L pass, when we're on a word w, we make use of all the left-pointing hash tables on the words to the right of w. After the pruning on this word, we build the left-pointing hash table this word. This guarantees idempotence of the pass -- after doing an L.R, doing another would change nothing.
Each connector has an integer c_word field. This refers to the closest word that it could be connected to. These are initially determined by how deep the connector is. For example, a deepest connector can connect to the neighboring word, so its c_word field is w+1 (w-1 if this is a left pointing connector). It's neighboring shallow connector has a c_word value of w+2, etc.
The pruning process adjusts these c_word values as it goes along, accumulating information about any way of linking this sentence. The pruning process stops only after no disjunct is deleted and no c_word values change.
The difference between RUTHLESS and GENTLE power pruning is simply that GENTLE uses the deletable region array, and RUTHLESS does not. So we can get the effect of these two different methods simply by always unsuring that deletable[][] has been defined. With nothing deletable, this is equivalent to RUTHLESS. --DS, 7/97
public void clean_table(int size, CList[] t)
public int left_connector_list_update(Connector c, int word_c, int w, boolean shallow)
public int right_connector_list_update(Connector c, int word_c, int w, boolean shallow)
public void prune(ParseOptions opts)
public int set_dist_fields(Connector c, int w, int delta)
public boolean possible_connection(Connector lc, Connector rc, boolean lshallow, boolean rshallow, int lword, int rword)
public boolean right_table_search(int w, Connector c, boolean shallow, int word_c)
public boolean left_table_search(int w, Connector c, boolean shallow, int word_c)
public void init_power()
public int left_connector_count(Disjunct d)
public int right_connector_count(Disjunct d)
public int power_hash(Connector c)
public void put_into_power_table(int size, CList[] t, Connector c, boolean shal)
public void init_cms_table()
public int cms_hash(java.lang.String s)
public boolean match_in_cms_table(java.lang.String pp_match_name)
public Cms lookup_in_cms_table(java.lang.String str)
public void insert_in_cms_table(java.lang.String str)
public int delete_from_cms_table(java.lang.String str)
public boolean rule_satisfiable(PPLinkset ls)