NU IT
Northwestern University Information Technology
MorphAdorner Northwestern
 
MorphAdorner Server Services: Sentence Splitter Service

Service name: sentencesplitter
Service description: Splits plain text into sentences.
HTTP methods allowed: GET, POST, OPTIONS
POST accepts as input: application/x-www-form-urlencoded
HTTP return codes: 200: service succeeded
400: service failed with an error

Query parameters

    corpusConfig Corpus configuration name. In the standard distribution these are ece, eme, and ncf.
    media Result format. One of json, xml, html, text .
    text Text to be processed.
    includeInputText Allowed values are true to include the input text in the output and false to not include the input text.
    langCode ISO language code. These are two or three character codes. The default is en, English. You may specify *** Detect *** to indicate that the server should try to determine the language from the text provided.

Sample POST form

<form accept-charset="UTF-8" method="post" action="sentencesplitter"
      target="_blank"
      name="sentencesplitter">
<table cellpadding="0" cellspacing="5">
<tr>
<td><strong>Text:</strong></td>
<td colspan="2">
<textarea name="text" rows="15" cols="76"></textarea>
</td>
</tr>
<tr>
<td valign="top">
<strong>
Lexicon:</strong>
</td>
<td>
<input type="radio" name="corpusConfig" value="eme">Early Modern English</input><br />
<input type="radio" name="corpusConfig" value="ece">Eighteen Century English</input><br />
<input type="radio" name="corpusConfig" value="ncf" checked="checked">Nineteenth Century Fiction</input>
</td>
</tr>
<tr>
<td><strong>Language:</strong></td>
<td>
<select name="langCode">
<option value="en" selected="selected">English</option>
<option value="">*** Detect ***</option>
<option value="af">Afrikaans</option>
<option value="ak">Akan</option>
<option value="sq">Albanian</option>
<option value="am">Amharic</option>
<option value="ar">Arabic</option>
<option value="hy">Armenian</option>
<option value="as">Assamese</option>
<option value="az">Azerbaijani</option>
<option value="bm">Bambara</option>
<option value="bas">Basa</option>
<option value="eu">Basque</option>
<option value="be">Belarusian</option>
<option value="bem">Bemba</option>
<option value="bn">Bengali</option>
<option value="bs">Bosnian</option>
<option value="br">Breton</option>
<option value="bg">Bulgarian</option>
<option value="my">Burmese</option>
<option value="ca">Catalan</option>
<option value="chr">Cherokee</option>
<option value="zh">Chinese</option>
<option value="kw">Cornish</option>
<option value="hr">Croatian</option>
<option value="cs">Czech</option>
<option value="da">Danish</option>
<option value="dua">Duala</option>
<option value="nl">Dutch</option>
<option value="eo">Esperanto</option>
<option value="et">Estonian</option>
<option value="ee">Ewe</option>
<option value="ewo">Ewondo</option>
<option value="fo">Faroese</option>
<option value="fil">Filipino</option>
<option value="fi">Finnish</option>
<option value="fr">French</option>
<option value="ff">Fulah</option>
<option value="gl">Gallegan</option>
<option value="lg">Ganda</option>
<option value="ka">Georgian</option>
<option value="de">German</option>
<option value="el">Greek</option>
<option value="kl">Greenlandic</option>
<option value="gu">Gujarati</option>
<option value="ha">Hausa</option>
<option value="haw">Hawaiian</option>
<option value="iw">Hebrew</option>
<option value="hi">Hindi</option>
<option value="hu">Hungarian</option>
<option value="is">Icelandic</option>
<option value="ig">Igbo</option>
<option value="in">Indonesian</option>
<option value="ga">Irish</option>
<option value="it">Italian</option>
<option value="ja">Japanese</option>
<option value="kab">Kabyle</option>
<option value="kam">Kamba</option>
<option value="kn">Kannada</option>
<option value="kk">Kazakh</option>
<option value="km">Khmer</option>
<option value="ki">Kikuyu</option>
<option value="rw">Kinyarwanda</option>
<option value="kok">Konkani</option>
<option value="ko">Korean</option>
<option value="lv">Latvian</option>
<option value="ln">Lingala</option>
<option value="lt">Lithuanian</option>
<option value="lu">Luba-Katanga</option>
<option value="mk">Macedonian</option>
<option value="mg">Malagasy</option>
<option value="ms">Malay</option>
<option value="ml">Malayalam</option>
<option value="mt">Maltese</option>
<option value="gv">Manx</option>
<option value="mr">Marathi</option>
<option value="mas">Masai</option>
<option value="ne">Nepali</option>
<option value="nd">North Ndebele</option>
<option value="nb">Norwegian Bokm�l</option>
<option value="nn">Norwegian Nynorsk</option>
<option value="nyn">Nyankole</option>
<option value="or">Oriya</option>
<option value="om">Oromo</option>
<option value="pa">Panjabi</option>
<option value="fa">Persian</option>
<option value="pl">Polish</option>
<option value="pt">Portuguese</option>
<option value="ps">Pushto</option>
<option value="rm">Raeto-Romance</option>
<option value="ro">Romanian</option>
<option value="rn">Rundi</option>
<option value="ru">Russian</option>
<option value="sg">Sango</option>
<option value="sr">Serbian</option>
<option value="sn">Shona</option>
<option value="ii">Sichuan Yi</option>
<option value="si">Sinhalese</option>
<option value="sk">Slovak</option>
<option value="sl">Slovenian</option>
<option value="so">Somali</option>
<option value="es">Spanish</option>
<option value="sw">Swahili</option>
<option value="sv">Swedish</option>
<option value="gsw">Swiss German</option>
<option value="ta">Tamil</option>
<option value="te">Telugu</option>
<option value="th">Thai</option>
<option value="bo">Tibetan</option>
<option value="ti">Tigrinya</option>
<option value="to">Tonga</option>
<option value="tr">Turkish</option>
<option value="uk">Ukrainian</option>
<option value="ur">Urdu</option>
<option value="uz">Uzbek</option>
<option value="vai">Vai</option>
<option value="vi">Vietnamese</option>
<option value="cy">Welsh</option>
<option value="yo">Yoruba</option>
<option value="zu">Zulu</option>
</select>
</td>
</tr>
<tr>
<td>&nbsp;</td>
<td>
<input type="checkbox" name="includeInputText" value="true"
       checked="checked"/>
Include input text in results
</td>
</tr>
<tr>
<td>
&nbsp;
</td>
<td>
&nbsp;
</td>
</tr>
<tr>
<td valign="top">
<strong>Results format:</strong>
</td>
<td>
<input type="radio" name="media" value="json">JSON format</input><br />
<input type="radio" name="media" value="xml" checked="checked">XML format</input><br />
<input type="radio" name="media" value="html">HTML format</input><br />
<input type="radio" name="media" value="text">Text format</input>
</td>
</tr>
<tr>
<td>
&nbsp;
</td>
<td>
&nbsp;
</td>
</tr>
<tr>
<td colspan="2">
<input type="submit" name="split" value="Split" />
</td>
</tr>
</table>
</form>

Output

Here we split a paragraph from Lincolns "Gettysburg Address" into sentences.

Now we are engaged in a great civil war, testing whether that nation, or any nation, so conceived and so dedicated, can long endure. We are met on a great battle-field of that war. We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live. It is altogether fitting and proper that we should do this.

The JSON and XML WordTokenizerResult echo the input text, the ISO language code langCode, and the corpusConfig. The sentences container wraps a sequence of sentence entries each of which represents a single parsed sentence from the input text. Each sentence contains a sequence of token entries representing the words and punctuation in the sentence. The meldedSentences container wraps a sequence of meldedSentence entries each of which contains a single untokenized sentence. The HTML and text versions provide displayable versions of the extracted sentences.

JSON output

{
  "SentenceSplitterResult": {
    "text": "Now we are engaged in a great civil war, testing whether that  nation, or any nation, so conceived and so dedicated, can long  endure. We are met on a great battle-field of that war. We have  come to dedicate a portion of that field, as a final resting  place for those who here gave their lives that that nation might  live. It is altogether fitting and proper that we should do  this.",
    "langCode": "en",
    "corpusConfig": "ncf",
    "sentences": [
      {
        "sentence": [
          {
            "token": [
              "Now",
              "we",
              "are",
              "engaged",
              "in",
              "a",
              "great",
              "civil",
              "war",
              ",",
              "testing",
              "whether",
              "that",
              "nation",
              ",",
              "or",
              "any",
              "nation",
              ",",
              "so",
              "conceived",
              "and",
              "so",
              "dedicated",
              ",",
              "can",
              "long",
              "endure",
              "."
            ]
          },
          {
            "token": [
              "We",
              "are",
              "met",
              "on",
              "a",
              "great",
              "battle-field",
              "of",
              "that",
              "war",
              "."
            ]
          },
          {
            "token": [
              "We",
              "have",
              "come",
              "to",
              "dedicate",
              "a",
              "portion",
              "of",
              "that",
              "field",
              ",",
              "as",
              "a",
              "final",
              "resting",
              "place",
              "for",
              "those",
              "who",
              "here",
              "gave",
              "their",
              "lives",
              "that",
              "that",
              "nation",
              "might",
              "live",
              "."
            ]
          },
          {
            "token": [
              "It",
              "is",
              "altogether",
              "fitting",
              "and",
              "proper",
              "that",
              "we",
              "should",
              "do",
              "this",
              "."
            ]
          }
        ]
      }
    ],
    "meldedSentences": [
      {
        "meldedSentence": [
          "Now we are engaged in a great civil war, testing whether that nation, or any nation, so conceived and so dedicated, can long endure.",
          "We are met on a great battle-field of that war.",
          "We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live.",
          "It is altogether fitting and proper that we should do this."
        ]
      }
    ]
  }
}

XML output

<?xml version="1.0"?>
<SentenceSplitterResult>
    <text>Now we are engaged in a great civil war, testing whether that  nation, or any nation, so conceived and so dedicated, can long  endure. We are met on a great battle-field of that war. We have  come to dedicate a portion of that field, as a final resting  place for those who here gave their lives that that nation might  live. It is altogether fitting and proper that we should do  this.</text>
    <langCode>en</langCode>
    <corpusConfig>ncf</corpusConfig>
    <sentences>
        <sentence>
            <token>Now</token>
            <token>we</token>
            <token>are</token>
            <token>engaged</token>
            <token>in</token>
            <token>a</token>
            <token>great</token>
            <token>civil</token>
            <token>war</token>
            <token>,</token>
            <token>testing</token>
            <token>whether</token>
            <token>that</token>
            <token>nation</token>
            <token>,</token>
            <token>or</token>
            <token>any</token>
            <token>nation</token>
            <token>,</token>
            <token>so</token>
            <token>conceived</token>
            <token>and</token>
            <token>so</token>
            <token>dedicated</token>
            <token>,</token>
            <token>can</token>
            <token>long</token>
            <token>endure</token>
            <token>.</token>
        </sentence>
        <sentence>
            <token>We</token>
            <token>are</token>
            <token>met</token>
            <token>on</token>
            <token>a</token>
            <token>great</token>
            <token>battle-field</token>
            <token>of</token>
            <token>that</token>
            <token>war</token>
            <token>.</token>
        </sentence>
        <sentence>
            <token>We</token>
            <token>have</token>
            <token>come</token>
            <token>to</token>
            <token>dedicate</token>
            <token>a</token>
            <token>portion</token>
            <token>of</token>
            <token>that</token>
            <token>field</token>
            <token>,</token>
            <token>as</token>
            <token>a</token>
            <token>final</token>
            <token>resting</token>
            <token>place</token>
            <token>for</token>
            <token>those</token>
            <token>who</token>
            <token>here</token>
            <token>gave</token>
            <token>their</token>
            <token>lives</token>
            <token>that</token>
            <token>that</token>
            <token>nation</token>
            <token>might</token>
            <token>live</token>
            <token>.</token>
        </sentence>
        <sentence>
            <token>It</token>
            <token>is</token>
            <token>altogether</token>
            <token>fitting</token>
            <token>and</token>
            <token>proper</token>
            <token>that</token>
            <token>we</token>
            <token>should</token>
            <token>do</token>
            <token>this</token>
            <token>.</token>
        </sentence>
    </sentences>
    <meldedSentences>
        <meldedSentence>Now we are engaged in a great civil war, testing whether that nation, or any nation, so conceived and so dedicated, can long endure.</meldedSentence>
        <meldedSentence>We are met on a great battle-field of that war.</meldedSentence>
        <meldedSentence>We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live.</meldedSentence>
        <meldedSentence>It is altogether fitting and proper that we should do this.</meldedSentence>
    </meldedSentences>
</SentenceSplitterResult>

HTML output (source)

<h3>4 sentences found.</h3>
<table border="0">
<tr>
<th align="left">S#</th>
<th align="left">Sentence</th>
</tr>
<tr>
<td valign="top" align="left"><strong>1</strong></td>
<td valign="top" align="left">Now we are engaged in a great civil war, testing whether that nation, or any nation, so conceived and so dedicated, can long endure.</td>
</tr>
<tr>
<td valign="top" align="left"><strong>2</strong></td>
<td valign="top" align="left">We are met on a great battle-field of that war.</td>
</tr>
<tr>
<td valign="top" align="left"><strong>3</strong></td>
<td valign="top" align="left">We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live.</td>
</tr>
<tr>
<td valign="top" align="left"><strong>4</strong></td>
<td valign="top" align="left">It is altogether fitting and proper that we should do this.</td>
</tr>
</table>

HTML output (display)

4 sentences found.

S# Sentence
1 Now we are engaged in a great civil war, testing whether that nation, or any nation, so conceived and so dedicated, can long endure.
2 We are met on a great battle-field of that war.
3 We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live.
4 It is altogether fitting and proper that we should do this.

Text output

4 sentences found.
S#	Sentence
1	Now we are engaged in a great civil war, testing whether that nation, or any nation, so conceived and so dedicated, can long endure.
2	We are met on a great battle-field of that war.
3	We have come to dedicate a portion of that field, as a final resting place for those who here gave their lives that that nation might live.
4	It is altogether fitting and proper that we should do this.
Home
 
Announcements and News
 
Documentation
 
Download MorphAdorner
 
Glossary
 
Helpful References
 
Licenses
 
Server
 
Talks
 
Tech Talk