|
AdornWithNamedEntities
adorns XML texts with named entities such as person,
location, time, date, and organization. It is an experimental
procedure based upon the Gate named entity extractor ANNIE
with a few modifications to improve its utility for literary
purposes.
Usage:
adornwithnamedentities outputdirectory input1.xml input2.xml ...
where
- outputdirectory -- output directory to receive xml files
adorned with named entities.
- input*.xml -- input TEI XML files.
The named entity adorner does not always recognize entities
which cross soft tags. Thus "Emma Woodhouse" may be
recognized as two separate entities. AdornedWithNamedEntities
should be run on the input files before their submission to
MorphAdorner.
Gate uses the following XML tags for marking named entities.
AdornWithNamedEntities maps these to the TEI Analytics "<rs>"
with a specific type= attribute value.
| Gate |
TEI Analytics |
| <Date> for a date |
<rs type="date"> |
| <Location> for a location |
<rs type="location"> |
| <Money> for an amount of money |
<rs type="money"> |
| <Organization> for an organization |
<rs type="organization"> |
| <Person> for a person |
<rs type="person"> |
| <Time> for a time |
<rs type="time"> |
Gate seems to generate "Date" where one might expect "Time" to appear.
In addition to the named entity types generated by Gate,
AdornWithNamedEntities can also generate <rs type="literary">
for literary references. This has not been fully implemented.
|