/beacon2-pi-api

Beacon v2 Production Implementation (API)

Primary LanguagePythonApache License 2.0Apache-2.0

Beacon v2 Production Implementation

Welcome to Beacon v2 Production Implementation (B2PI). This is an application that makes an instance of Beacon v2 be production ready. To go to the beacon reference implementation (test instance) please visit B2RI. For further information on how to use this app, please visit B2RI docs

Main changes from B2RI

  • Handlers of the endpoints are classes, not functions
  • Unit testing has been developed for the application, starting with 108 unit tests that cover 4000 lines of code approximately (100%)
  • Concurrency testing has been applied for this new beacon instance, showing results of responses for more than 3 million genomic variants splitted in different datasets in less than 100 millisecs, for a total of 1000 requests made by 10 users per second at the same time.
  • Linking ids to a dataset in a yaml file is not needed anymore
  • A couple more indexes for mongoDB have been applied, that, in addition to the restructuration of the code, have improved the quickness of the responses
  • Authentication/Authorization is now applied as a decorator, not as a different container
  • LOGS now show more relevant information about the different processes (from request to response) including transaction id, the time of execution of each function and the initial call and the return call
  • Exceptions now are raised from the lower layer to the top layer, with information and status for the origin of the exception
  • Architecture of the code is not dependent on a particular database, meaning that different types of databases (and more than one) can be potentially applied to this instance (although now only MongoDB is the one developed)
  • Parameters are sanitized
  • Users can manage what entry types want their beacon to show by editing a manage conf file inside source

TLS configuration

To enable TLS for the Becaon API set beacon_server_crt and beacon_server_key to the full paht of the server certificate and server key in beacon/conf/conf.py file.

TLS secured MongoDB

Edit the file beacon/connections/mongo/conf.py and set database_certificate to the full path to the client certificate. If a private CA is used also set the database_cafile to the full path to the CA certificate.

  • The MongoDB client certificate should be in the combined PEM format client.key + "\n" + client.crt

Prerequisites

You should have installed:

  • Docker
  • Docker Compose
  • Data from RI TOOLS. Please, bear in mind that the datasetId for your records must match the id for the dataset in the /datasets entry type.

Light up the database and the Beacon

Up the containers

If you are using a build with all the services in the same cluster, you can use:

docker-compose up -d --build

Load the data

To load the database we execute the following commands:

docker cp /path/to/analyses.json mongoprod:tmp/analyses.json
docker cp /path/to/biosamples.json mongoprod:tmp/biosamples.json
docker cp /path/to/cohorts.json mongoprod:tmp/cohorts.json
docker cp /path/to/datasets.json mongoprod:tmp/datasets.json
docker cp /path/to/genomicVariations.json mongoprod:tmp/genomicVariations.json
docker cp /path/to/individuals.json mongoprod:tmp/individuals.json
docker cp /path/to/runs.json mongoprod:tmp/runs.json
docker exec mongoprod mongoimport --jsonArray --uri "mongodb://root:example@127.0.0.1:27017/beacon?authSource=admin" --file /tmp/datasets.json --collection datasets
docker exec mongoprod mongoimport --jsonArray --uri "mongodb://root:example@127.0.0.1:27017/beacon?authSource=admin" --file /tmp/analyses.json --collection analyses
docker exec mongoprod mongoimport --jsonArray --uri "mongodb://root:example@127.0.0.1:27017/beacon?authSource=admin" --file /tmp/biosamples.json --collection biosamples
docker exec mongoprod mongoimport --jsonArray --uri "mongodb://root:example@127.0.0.1:27017/beacon?authSource=admin" --file /tmp/cohorts.json --collection cohorts
docker exec mongoprod mongoimport --jsonArray --uri "mongodb://root:example@127.0.0.1:27017/beacon?authSource=admin" --file /tmp/genomicVariations.json --collection genomicVariations
docker exec mongoprod mongoimport --jsonArray --uri "mongodb://root:example@127.0.0.1:27017/beacon?authSource=admin" --file /tmp/individuals.json --collection individuals
docker exec mongoprod mongoimport --jsonArray --uri "mongodb://root:example@127.0.0.1:27017/beacon?authSource=admin" --file /tmp/runs.json --collection runs

This loads the JSON files inside of the data folder into the MongoDB database container. Each time you import data you will have to create indexes for the queries to run smoothly. Please, check the next point about how to Create the indexes.

Create the indexes

Remember to do this step every time you import new data!!

You can create the necessary indexes running the following Python script:

docker exec beaconprod python /beacon/connections/mongo/reindex.py

Fetch the ontologies and extract the filtering terms

This step consists of analyzing all the collections of the Mongo database for first extracting the ontology OBO files and then filling the filtering terms endpoint with the information of the data loaded in the database.

You can automatically fetch the ontologies and extract the filtering terms running the following script:

 docker exec beaconprod python beacon/connections/mongo/extract_filtering_terms.py

Get descendant and semantic similarity terms

  • If you have the ontologies loaded and the filtering terms extracted* , you can automatically get their descendant and semantic similarity terms by following the next two steps:
  1. Add your .obo files inside ontologies naming them as the ontology prefix in lowercase (e.g. ncit.obo) and rebuild the beacon container with:

  2. Run the following script:

 docker exec beaconprod python beacon/connections/mongo/get_descendants.py

Check the logs

Check the logs until the beacon is ready to be queried:

docker-compose logs -f beaconprod

Usage

You can query the beacon using GET or POST. Below, you can find some examples of usage:

For simplicity (and readability), we will be using HTTPie.

Using GET

Querying this endpoit it should return the 13 variants of the beacon (paginated):

http GET http://localhost:5050/api/g_variants

You can also add request parameters to the query, like so:

http GET http://localhost:5050/api/individuals?filters=NCIT:C16576,NCIT:C42331

Using POST

You can use POST to make the previous query. With a request.json file like this one:

{
    "meta": {
        "apiVersion": "2.0"
    },
    "query": {
        "requestParameters": {
    "alternateBases": "G" ,
    "referenceBases": "A" ,
"start": [ 16050074 ],
            "end": [ 16050568 ],
	    "referenceName": "22"
        },
        "filters": [],
        "includeResultsetResponses": "HIT",
        "pagination": {
            "skip": 0,
            "limit": 10
        },
        "testMode": false,
        "requestedGranularity": "record"
    }
}

You can execute:

curl \
  -H 'Content-Type: application/json' \
  -X POST \
  -d '{
    "meta": {
        "apiVersion": "2.0"
    },
    "query": {
        "requestParameters": {
    "alternateBases": "G" ,
    "referenceBases": "A" ,
"start": [ 16050074 ],
            "end": [ 16050568 ],
	    "referenceName": "22"
        },
        "filters": [],
        "includeResultsetResponses": "HIT",
        "pagination": {
            "skip": 0,
            "limit": 10
        },
        "testMode": false,
        "requestedGranularity": "record"
    }
}' \
  http://localhost:5050/api/g_variants

But you can also use complex filters:

{
    "meta": {
        "apiVersion": "2.0"
    },
    "query": {
        "filters": [
            {
                "id": "UBERON:0000178",
                "scope": "biosample",
                "includeDescendantTerms": false
            }
        ],
        "includeResultsetResponses": "HIT",
        "pagination": {
            "skip": 0,
            "limit": 10
        },
        "testMode": false,
        "requestedGranularity": "count"
    }
}

You can execute:

http POST http://localhost:5050/api/biosamples --json < request.json

And it will use the ontology filter to filter the results.

Allowing authentication/authorization

Go to auth folder and create an .env file with the next Oauthv2 OIDC Identity Provider Relying Party known information:

CLIENT_ID='your_idp_client_id'
CLIENT_SECRET='your_idp_client_secret'
USER_INFO='https://login.elixir-czech.org/oidc/userinfo'
INTROSPECTION='https://login.elixir-czech.org/oidc/introspect'
ISSUER='https://login.elixir-czech.org/oidc/'
JWKS_URL='https://login.elixir-czech.org/oidc/jwk'

For Keycloak IDP, an "aud" parameter will need to be added to the token's mappers, matching the Audience for the Keycloak realm.

Making a dataset public/registered/controlled

In order to assign the security level for a dataset in your beacon, please go to permissions/datasets and add your dataset into the .yml corresponding file you wish to assign the permissions for it.

Managing configuration

You can edit some parameters for your Beacon v2 API that are in conf.py. For that, edit the variables you see fit, save the file and restart the API by executing the next command:

docker-compose restart beaconprod

Managing source

You can edit some parameters concerning entry types developed for your Beacon in manage.py. For that, change to True the entry types you want to have developed and shown with data for your beacon and execute the next command:

docker-compose restart beaconprod

Tests report

Beacon prod concurrency test