/ga4gh-discovery-search

GA4GH Search API specification.

Apache License 2.0Apache-2.0

Search Swagger Validator

Search is a framework for searching genomics and clinical data.

The Search framework is comprised of a collection of complementary standards that data custodians can implement to make their biomedical data more discoverable.

The schemas for most components of the framework are developed by the Discovery Work Stream of the Global Alliance for Genomics & Health.

Background

The GA4GH has previously developed two standards for discovery. Beacon is a standard for discovery of genomic variants, while Matchmaker is a standard for discovery of subjects with certain genomic and phenotypic features. Implementations of these standards have been linked into federated networks (Beacon Network and Matchmaker Exchange, respectively).

Each standard (and corresponding network) has been successful in its own right. It was acknowledged that it would be broadly useful to develop standards that abstracted common utilities for building searchable, federated networks for a variety of applications in genomics and health.

The Discovery Work Stream develops Search as a general-purpose framework for building federatable search-based applications.

Goals

  • federation It is possible to federate searches across multiple implementations. Federations of the search framework reference common schemas and properties.
  • backend agnostic It is possible to implement the framework across a large variety of backend datastores.

Out of scope

  • developing data models The Search framework does not define data models. It defers that effort to others in the GA4GH or outside implementers.
  • application development The Search framework does not prescribe a specific application. It is intentionally general-purpose. It defers to other efforts in the Discovery Work Stream, GA4GH, and beyond to build domain-specific applications.

How to view

Search API is specified in OpenAPI in search-api.yaml, which you can view using Swagger Editor.

How to test

Use Swagger Validator Badge to validate the YAML file, or its OAS Validator wrapper.

Complementary standards

The following standards are complementary but not required by the Search framework:

  • The Service Info standard can be used to describe the service
  • The Service Registry standard can be used to create networks of search services

Architecture

Components

The search API consists of Table and Query APIs, describing search results and queries, respectively.

Discovery Search API Specification

See SEARCHSPEC.md

Use cases

See USECASES.md

Examples

Implementations and tooling

Tables-in-a-bucket (no-code implementation)

The specification allows for a no-code implementation as a collection of files served statically (e.g. in a cloud bucket, or a Git repository). To do this, you need the following JSON files:

  • tables: served in response to GET /tables
  • table/{table_name}/info: served in response to GET /table/{table_name}/info. e.g. a table with the name mytable should have a corresponding file table/mytable/info
  • table/{table_name}/data: served in response to GET /table/{table_name}/data. e.g. a table with the name mytable should have a corresponding file table/mytable/data
  • table/{table_name}/data_{pageNumber}, which will be linked in the next_page_url of the first table (e.g. mytable).
  • table/{table_name}/data_models/{schemaFile}: Though not required, data models may be linked via $ref. Data models can also be stored as static JSON documents, and be referred to by relative or absolute URLs.

A concrete, example test implementation is available (list endpoint) with documentation.

Google Sheets implementation

A Google Sheets spreadsheet can also be exposed via the tables API via the sheets adapter, located here.

Security

Sensitive information transmitted over public networks, such as access tokens and human genomic data, MUST be protected using Transport Level Security (TLS) version 1.2 or later, as specified in RFC 5246.

If the data holder requires client authentication and/or authorization, then the client’s HTTPS API request MUST present an OAuth 2.0 bearer access token as specified in RFC 6750, in the Authorization request header field with the Bearer authentication scheme:

Authorization: Bearer [access_token]

The policies and processes used to perform user authentication and authorization, and the means through which access tokens are issued, are beyond the scope of this API specification. GA4GH recommends the use of the OpenID Connect and OAuth 2.0 framework (RFC 6749) for authentication and authorization.

CORS

Cross-origin resource sharing (CORS) is an essential technique used to overcome the same origin content policy seen in browsers. This policy restricts a webpage from making a request to another website and leaking potentially sensitive information. However the same origin policy is a barrier to using open APIs. GA4GH open API implementers should enable CORS to an acceptable level as defined by their internal policy. For any public API implementations should allow requests from any server.

GA4GH published a CORS best practices document, which implementers should refer to for guidance when enabling CORS on public API instances.

How to contribute

The GA4GH is an open community that strives for inclusivity. Guidelines for contributing to this repository are listed in CONTRIBUTING.md. Teleconferences and corresponding meeting minutes are open to the public. To learn how to contribute to this effort, please email Rishi Nag (rishi.nag@ga4gh.org).

How to notify GA4GH of potential security flaws

Please send an email to security-notification@ga4gh.org.