NU IT
Northwestern University Information Technology
MorphAdorner Northwestern
 
Converting an adorned file to Sketch engine format

AdornedToSketch converts one or more adorned files to the verticalized input required by the Sketch or NoSketch corpus search engines.

Usage:

adornedtosketch sketchinput.txt corpusname adorned1.xml adorned2.xml ...

where

  • sketchinput.txt specifies the output filename of the verticalized representation required for input to the Sketch or NoSketch engines.
  • corpusname specifies the corpus name to be used when creating the Sketch engine input.
  • adorned1.xml adorned2.xml ... specifies the input MorphAdorned XML files from which to produce the Sketch engine input.

Known flaw: AdornedToSketch does not generate the "glue" elements which bind punctuation marks to word tokens. Searching the corpus still works fine in the Sketch or NoSketch engine, but the punctuation marks are displayed detached from any token to which they would normally be attached.

The Sketch engine, and its simpler sibling the NoSketch engine, are corpus query systems based upon the thesis work of Pavel Rychl�. The engines are products of Lexical Computing Ltd., headed by computational linguist Adam Kilgarriff.

Home
 
Announcements and News
 
Documentation
 
Download MorphAdorner
 
Glossary
 
Helpful References
 
Licenses
 
Server
 
Talks
 
Tech Talk