datopian/data-api

[epic] Read Service with CKAN v2 compatibility

Opened this issue · 4 comments

We are using Hasura as the base wrapper around Postgres.

We need a backwards compatible API

# dapi-endpoint includes version e.g. data-api.ckan.com/v1/ ...
{dapi-endpoint}/{dapi-id}/graphql  # raw Hasura
{dapi-endpoint}/{dapi-id}/datastore_search  # CKAN DataStore /api/3/datastore_search

# let's see about this one (if hard we may skip for now)
{dapi-endpoint}/{dapi-id}/datastore_search_sql/  # CKAN DS SQL /api/3/datastore_search_sql

Terminology

  • dapi-id = table name in postgres (?) tbc
    • NB: hasura does not allow - in table names ...

Acceptance

  • We have a new graphql endpoint
    • You can get CSV and xlsx data from it (?)
  • We have backwards compatible datastore_search
  • We have backwards compatible datastore_search_sql
  • We support CORS
  • We have a hook for metrics (to google analytics)

BONUS

  • We implement existing permissions (what are they?)

Tasks

  • Setup - create / define repo
    • Bootstrap an Express app (?use a boilerplate instead of doing it from scratch https://expressjs.com/en/starter/generator.html)
    • Setup tests - what should we use here? Supertest remains the default for node http server tests ...
    • Setup mocks / fixtures ie. the fixture postgres database
      • For mocks we can either use nock or mitm.js (interesting to give a try)
    • Setup CI to run the tests
  • Deploy to Hasura-dev on our cluster
  • Create the graphql endpoint
    • should serve as a proxy from/to hasura graphql server
    • receives a reqeust => preps a graphql query => hits hasura => pipes back the response
  • Create datastore_search endpoint
  • Create datastore_search_sql endpoint

Analysis

graph TD

subgraph Read API service
  hasura[Hasura]
  apiwrapper[NodeJS wrapper app]
end

postgres[Postgres]

apiwrapper --> hasura
hasura --> postgres
Loading

@EvgeniiaVak @leomrocha can you update on where we are with this?

@rufuspollock At the moment we have a basic API (as explained in the README file) that is not backwards compatible. It can take graphql queries and queries with other parametrizations.

We could make a few changes to have backwards compatibility adding endpoints that behind the scenes convert the calls to JSON, or doing a conversion in the same endpoints but I would suggest we keep the new API and deprecate the CKAN one adding specific compatibility endpoints giving some deadline (a few years?) to move to the new ones.

The datastore_search_sql endpoint should be deprecated a directly exposing the DB has many problems for portability and maintenance.

Not all the endpoints make sense with the new syntax as some can be merged.

In the case of making a GraphQL query, the server redirects to graphql

Downloads for CSV and XLSX should be added but I wouldn't do that as a graphql response (endpoint), instead what I would do is to maintain the graphql query and as response do a file stream directly, or saving to a file server with some file retention policy and returning the file url as a redirect and download.

@rufuspollock @anuveyatsu @EvgeniiaVak I would like your thoughts here.

This issue might be already outdated and some discussion is needed on the CKAN backwards compatibility.

I would like to avoid exposing the datastore_search_sql endpoint as it not only makes another security risk headache but I think we should deprecate it. Maybe on the services that are still using this endpoint keep using CKAN while the systems are migrated to the new GraphQL and datastore_search endpoints.

For the complete backwards compatibility of datastore_search with CKAN there are still some things to implement that with the changes make no real sense or are not easily feasible. If is needed for some current migration we can implement the needed features, but I would make it as per need basis only and not as a general requirement. The same if a migration can be done with the datastore_search_sql endpoint, I would not include it in the data-api distribution but only do it in the cases that need it for migrating from CKAN and with the view of taking it out in the future.

I think one of the most important features to do right now is to implement access rights and security.

I would like to avoid exposing the datastore_search_sql endpoint as it not only makes another security risk headache but I think we should deprecate it. Maybe on the services that are still using this endpoint keep using CKAN while the systems are migrated to the new GraphQL and datastore_search endpoints.

I hear you .... and some people do use this so we will need it. I think for this we could implement a function purely in nodejs wrapper (outside Hasura). It's a relatively straightforward to do i think.

For the complete backwards compatibility of datastore_search with CKAN there are still some things to implement that with the changes make no real sense or are not easily feasible. If is needed for some current migration we can implement the needed features, but I would make it as per need basis only and not as a general requirement. The same if a migration can be done with the datastore_search_sql endpoint, I would not include it in the data-api distribution but only do it in the cases that need it for migrating from CKAN and with the view of taking it out in the future.

Can we make sure what is and is not implemented is very clearly documented in the README - is that something we could do right now.

I think one of the most important features to do right now is to implement access rights and security.

Can you open a separate issue about this if we don't have it already. For this i'm anticipating that we move to the JWT approach using ckanext-authz i.e. ckan using ckanext-authz gives out tokens that this data api service can then use to authorize requests.