/serratus-summary-api

serving Serratus summary data via a public API

Primary LanguagePythonGNU Affero General Public License v3.0AGPL-3.0

serratus-summary-api

Example routes

Local usage

Setup

Add file env.sh:

export SQL_USERNAME=web_api
export SQL_PASSWORD=serratus
# [optional] create/load virtualenv
pip install -r requirements.txt

Start server

bash run.sh

Test

bash test.sh

AWS Setup

Elastic Beanstalk

  • Application name: serratus-summary-api-flask
  • Environment
    • Web server environment
    • Name: serratus-summary-api
    • Platform: Python (3.7 running on 64bit Amazon Linux 2)
    • Sample app (will be overidden by CodePipeline deployment)

After creation:

  • Add load balancer listener
    • Port: 443
    • Protocol: HTTPS
    • SSL certificate: *.serratus.io
  • Processes
    • Health check path: /summary/nucleotide/run=ERR2756788
  • Environment variables
    • Add SQL_USERNAME, SQL_PASSWORD from to env.sh

CodePipeline

  1. Settings
    • Pipeline name: serratus-summary-api-flask
    • New service role
  2. Source stage
    • Source provider: GitHub (Version 1)
    • Select this repo/branch
    • Change detection: GitHub webhooks
  3. Build stage: skip
  4. Deploy stage
    • Provider: AWS Elastic Beanstalk
    • Region: us-east-1
    • Application/environment names from above

Route 53

  • A record for api.serratus.io -> Elastic Beanstalk endpoint

RDS

See https://github.com/ababaian/serratus/wiki/Serratus-SQL-Database-Management

Debugging

  • Disable caches: in config.py set CACHE_DEFAULT_TIMEOUT = 1 (timeout after 1 second)

TODO

  • /protein/*
  • /rdrp/*
  • handle timeouts e.g.

    DatabaseError: current transaction is aborted, commands ignored until end of transaction block

  • investigate CACHE_TYPE = 'filesystem' and CACHE_THRESHOLD

(Beta) Data API

The Data API is meant to be a general purpose interface for accesing the data in the serratus db. It is a thin REST layer on top of the db that lets you access any of its tables (as one would do with SQL) in a uniform and predictable way.

The Data API resides on the /data endpoint. Each table is mapped to path in this endpoint like /data/<table>.

POST Request

The primary way to request data from the Data API is through a POST request to any /data/<table> path with a JSON payload that may contain some of the following keys:

  • _limit: number of rows to retrieve, defaults to 8

  • _offset: retrieves rows from the specified offset, defaults to 0

  • <column_name>: if a key is a column name of the queried table, one could pass (a single or) a list of values to limit the query to a subset of matches. In code, this adds a WHERE <column_name> in (values) clause to the query.
    This behaves differently depending on a few simple conditions:

    • if the column is of a numeric type AND the query is a list of two elements: it will return the range of matches between the two list elements (inclusive) i.e. WHERE <column_name> >= value[0] AND <column_name> <= value[0] if one of these values is an empty string, it will be removed from the clause e.g. the list ['', 99] will produce a WHERE <column_name> <= value[1] clause

    • if the column is a character-based type (varchar, text): it will return all values matching exactly the ones on the list provided i.e. WHERE <column_name> in (values)

GET Request

GET requests to any /data/<table> path get resolved by translating them to a corresponding POST request, as follows:

  • The path remains the same.
  • Any URL parameters in the GET request are sent as part of the JSON payload of the POST request. Example: ?param_one=one&param_two=two&param_three=three will send a {"param_one":"one","param_two":"two","param_three":"three"} payload.
  • All parameters are mapped as string values. If the string contains commas, it will be split into a list and sent as such. This is useful for queries where you want to get values that match a specific set. Note that if you're querying against a column with a numeric type the way this list is handled by the POST request allows you to easily query ranges of values. Examples:
    • greater than: some_column=1234, will become some_column:['1234', ''] in the POST request, which will become a WHERE some_column >= 1234 clause

    • less than: some_column=,1234 will become some_column:['', '1234'] in the POST request, which will become a WHERE some_column <= 1234 clause

GET requests are meant for trivial queries to the db and one big difference against POST requests is that all of them are cached for (at least) a day.

Authentication

The Data API uses HTTP Basic authentication.

The credentials are serratus:serratus.

This doesn't do much at the moment but is it in place in case we need some sort of access/rate control in the future, ... also to make sure that you read this small guide before using the API.

Examples

Sample POST request to get data of two run_idss from the rfamily table:

curl -H 'Content-Type: application/json' -X POST -d '{"limit":8,"offset":0,"run_id":["DRR000614","DRR001252"]}' -u 'serratus:serratus' https://api.serratus.io/data/rfamily

Same query using a GET request and default field values:

curl -u 'serratus:serratus' https://api.serratus.io/data/rfamily?run_id=DRR000614,DRR001252

Same query as before but now retrieving all matches above (or equal) to a certain value for percent_identity:

curl -u 'serratus:serratus' https://api.serratus.io/data/rfamily?percent_identity=60&run_id=DRR000614,DRR001252

Sample response:

{
  "data": [
    {
      "run_id": "DRR000614",
      "phylum_name": "Kitrinoviricota",
      "family_name": "Alphaflexiviridae",
      "family_group": "Alphaflexiviridae-1",
      "coverage_bins": "___________________:_____",
      "score": 1,
      "percent_identity": 73,
      "depth": 0.1,
      "n_reads": 2,
      "aligned_length": 22
    },
    ... x 10 times
  ]
}

Errors look like this:

{
  "error": "<some error message>"
}

HTTP requests will most likely wotk with your favorite framework/language, to get started on that, search for something like this.