/nlp-aas

NLP As A Service

Primary LanguageGroovyApache License 2.0Apache-2.0

Natural Language Processing As A Service

The NLP-AAS project provides an asynchronous HTTP API for the Language Application Grid's SOAP services.

Calling Services

To invoke a service POST a JSON document to https://api.lappsgrid.org/nlpaas/submit. The format of the JSON document is:

{
  "services": [ "service_id_1", "service_id_2", ... , "service_id_N" ],
  "type" : "text/plain",
  "payload" : "The text to process."
}

Where:

  1. services is a list of IDs of the services to be invoked. You can obtain a list of the services and their service IDs from the LAPPS Grid services page. There are also shortcuts for commonly used services from Stanford CoreNLP, Apache OpenNLP, Lingpipe, and GATE.
  2. type the format of the data being POSTed. This must be one of text/plain or application/json. If the type is application/json then the payload must be a LAPPS Data object.
  3. payload the data to be processed

The server will respond with a 201 CREATED message with a Location header in the response that provides the URL to use to check the status of the job submission.

$ curl -i -H "content-type: application/json" -d @request.json https://api.lappsgrid.org/nlpaas/submit
HTTP/1.1 201 
Location: /job/910ca618-a78c-40eb-9673-3ffa24a3fb1f
Content-Length: 0
Date: Mon, 16 Dec 2019 18:36:15 GMT

Use the /job URL to obtain information about the submitted job including:

  1. The job status (IN_QUEUE, IN_PROGRESS, DONE, or ERROR).
  2. The time (UTC) the job was submitted.
  3. The time processing started.
  4. The time processing finished (if successful).
  5. The time processing was halted due to an error.
  6. The total elapsed processing time (in milliseconds).
  7. An error message, if any, and
  8. The URL of the download file, if ready and available.
$ curl -i https://api.lappsgrid.org/nlpaas/job/910ca618-a78c-40eb-9673-3ffa24a3fb1f
HTTP/1.1 200 
Content-Type: application/json;charset=UTF-8
Transfer-Encoding: chunked
Date: Mon, 16 Dec 2019 18:41:42 GMT

{
    "id": "910ca618-a78c-40eb-9673-3ffa24a3fb1f",
    "elapsed": 2029,
    "status": "DONE",
    "submitted_at": "2019-12-16T18:36:15.808Z",
    "started_at": "2019-12-16T18:36:15.812Z",
    "finished_at": "2019-12-16T18:36:17.841Z",
    "result_URL": "/download/910ca618-a78c-40eb-9673-3ffa24a3fb1f"
}

Send a GET request to the result_URL to download the processed document. Files will be available for download for 30 minutes after processing has completed.

Validating a Pipeline of Servives

To ensure that a series of service is valid, that is, the input requirements of each service have been satisfied, send the job request to https://api.lappsgrid.org/nlpaas/validate

$ curl -i -H "content-type: application/json" -d @request.json https://api.lappsgrid.org/nlpaas/validate 
HTTP/1.1 100 

HTTP/1.1 200 
Content-Type: application/json;charset=UTF-8
Transfer-Encoding: chunked
Date: Mon, 16 Dec 2019 18:55:57 GMT

{"status":200,"message":"OK"}

$ curl -i -H "content-type: application/json" -d @missing-annotations.json https://api.lappsgrid.org/nlpaas/validate 
HTTP/1.1 400 
Content-Type: application/json;charset=UTF-8
Transfer-Encoding: chunked
Date: Tue, 17 Dec 2019 18:09:15 GMT
Connection: close

{
    "status": 400,
    "message": "Required annotations are missing",
    "reason": "anc:opennlp.cloud.tokenizer_1.0.0 requires http://vocab.lappsgrid.org/Sentence"
}

Shortcuts For Common Services

Shortcut IDs have been created for the most commonly used services from Stanford CoreNLP, Apache OpenNLP, and Lingpipe,

  Stanford OpenNLP Lingpipe Gate
Tokenizer stanford.tokenizer opennlp.tokenizer lingpipe.tokenizer gate.tokenizer
Sentence Segmenter stanford.splitter opennlp.splitter lingpipe.splitter gate.splitter
POS Tagger stanford.tagger opennlp.tagger lingpipe.tagger gate.tagger
Lemmatizer N/A opennlp.lemmatizer N/A N/A
Named Entity Recognizer stanford.ner opennlp.ner lingpipe.ner gate.ner
{
  "services" : [ 
    "stanford.tokenizer", 
    "gate.splitter",
    "opennlp.tagger",
    "opennlp.lemmatizer", 
    "lingpipe.ner" 
  ],
  "type" : "text/plain",
  "payload": "Hello world."
}

Pre-configured Pipelines

There are two preconfigure pipelines that can be used for Named Entity Recognition:

  1. stanford.ner.pipeline
  2. opennlp.ner.pipeline

There are no pre-configured GATE or Lingpipe NER pipelines at this time.

Finding Services

To find the available services that are capable of producing a given annotation type send a GET request to https://api.lappsgrid.org/nlpaas/producers?type={type}

$ curl -i http://api.lappsgrid.org/nlpaas/producers?type=token%23lemma
HTTP/1.1 200 
Content-Type: application/json;charset=UTF-8
Transfer-Encoding: chunked
Date: Tue, 17 Dec 2019 00:55:26 GMT

{
    "status": 200,
    "producers": [
        "anc:opennlp.cloud.lemmatizer_pipeline_1.0.0",
        "anc:opennlp.cloud.ner_pipeline_1.0.0",
        "anc:opennlp.cloud.pipeline_1.0.0",
        "anc:opennlp.cloud.lemmatizer_1.0.0",
        "anc:stanford.cloud.ner_1.0.0",
        "anc:stanford.cloud.lemmatizer_1.0.0",
        "anc:gost_1.0.0-SNAPSHOT"
    ]
}

Finding Information About Services

Each service in the LAPPS Grid returns metadata about the input formats in accepts, the annotations it produces, the annotations it requires in its input, licensing terms, and much more. To obtain metadata about a service use the /metadata?id={id} end point.

$ curl -i https://api.lappsgrid.org/nlpaas/metadata?id=opennlp.ner
HTTP/1.1 200 
Content-Type: application/json
Content-Length: 859
Date: Tue, 17 Dec 2019 00:59:35 GMT

{
  "discriminator" : "http://vocab.lappsgrid.org/ns/meta",
  "payload" : {
    "$schema" : "https://vocab.lappsgrid.org/schema/1.1.0/metadata-schema.json",
    "name" : "org.lappsgrid.cloud.opennlp.soap.NamedEntityRecognizer:1.0.0-SNAPSHOT",
    "version" : "1.0.0-SNAPSHOT",
    "toolVersion" : "1.9.1",
    "description" : "Apache OpenNLP Named Entity Recognizer",
    "allow" : "http://vocab.lappsgrid.org/ns/allow#any",
    "license" : "http://vocab.lappsgrid.org/ns/license#apache-2.0",
    "requires" : {
      "format" : [ "http://vocab.lappsgrid.org/ns/media/jsonld#lif" ],
      "annotations" : [ "http://vocab.lappsgrid.org/Token", "http://vocab.lappsgrid.org/Token#pos" ]
    },
    "produces" : {
      "format" : [ "http://vocab.lappsgrid.org/ns/media/jsonld#lif" ],
      "annotations" : [ "http://vocab.lappsgrid.org/NamedEntity" ]
    }
  }
}