TranslatorArchitecture

Process

This repository tracks the decision making for the Translator architecture.

This README documents the current strawman architecture. Changes must be made via pull requests. Questions or discussion around a topic that is not easily related to a specific pull request occurs in github issues.

Definitions

  • Message: A Message object as defined in the Translator Reasoner API here
  • KP (Knowledge Provider): a Translator software component, not a project team
  • ARA (Automated Relay Agent): a Translator software component, not a project team
  • KS (Knowledge Source): a non-Translator source of information that can be ingested to produce a KP.

Architecture Principles

  1. The goal is to create a single integrated product from federated services and data
  2. Which components communicate with one another?
    1. ARS broadcasts query (Message) to one or more ARAs
    2. ARAs respond to ARS with Message
    3. ARA sends query messages to KPs
    4. KPs respond to ARAs with Message
  3. Interfaces:
    1. All communication between the ARS and ARAs conforms to the ReasonerAPI Message spec
    2. KP can expose their information using these methods:
      1. ReasonerAPI Message
      2. Any SmartAPI-annotated interface
      3. A file dump conforming to KGX standards
    3. The Translator consortium will develop tools to automatically
      1. proxy ReasonerAPI calls to smartAPI calls and
      2. deploy ReasonerAPIs of KGX file dumps
    4. Subsequent requirements on KPs in this document will specify their application to ReasonerAPI, SmartAPI, and/or KGX interfaces.
  4. Entities in any ReasonerAPI message (ARS/ARA or ARA/KP) or KGX file-based communication are represented using compact URIs (CURIES), which must be expandable to full IRIs using a biolink-model provided json-ld context file. Entities returned from a non-ReasonerAPI smartAPI-registered KP must provide sufficient information in the registry to allow an automated conversion of the entity identifier to a biolink-model CURIE.
  5. Node Identifiers
    1. KPs must expose machine readable information about the types of node identifiers that they consume and produce.
    2. ARAs or other integration tools such as KGX will perform node identifier equivalence translations.
    3. The consortium will produce or adopt equivalent id sets, which will be shared across Translator tools. Multiple Translator teams will contribute expertise to these sets, but that expertise will produce centralized results.
    4. SRI will provide tools for disseminating these equivalent identifiers, drawing on the prior work of multiple Translator teams.
  6. Edge Predicates
    1. Relationships between entities (edges) have a predicate indicating the specific type of relationship between the entities.
    2. The biolink model will contain a set of predicates (biolink predicates) used to bridge across pre-existing predicate vocabularies
    3. The biolink model will designate a set of such vocabularies that can be mapped to biolink predicates. These vocabularies are called biolink-mapped.
    4. Predicates in ReasonerAPI messages and KGX files must be biolink predicates.
    5. Responses from non-ReasonerAPI smartAPI-registered KP must provide sufficient information via the registry that clients can determine the predicate as an identifier from a biolink-mapped vocabulary.
    6. As a best practice, KPs should map ingested predicates to a biolink-mapped vocabulary as precisely as possible, and rely on tools to convert these predicates into biolink predicates.
    7. The SRI will provide mapping tools to perform this conversion.
  7. ARAs and KPs may both score answers (provide scores in the message); ARAs are required to score answers.
  8. KPs should not call other KPs.
  9. KPs that implement the Translator Reasoner API must perform the following kinds of reasoning in answering queries:
    1. Making identifiers more specific, e.g. responding to a query involving an entity with information related to a subclass of that entity.
    2. Making categories in a query more specific. e.g. responding to a query for a biolink:NamedThing with a particular biolink:ChemicalSubstance.
    3. Making predicates more specific, e.g. responding to a query for “affects expression of” with an edge with predicate “increases expression of”.
    4. Inverting predicates. e.g. responding to a query with predicate P with an edge whose predicate is the inverse of P.
  10. ARAs obtain biomedical data only via KPs (or other ARAs), not from locally-cached aggregated graphs or non-Translator data sources.
  11. Aggregated graphs must be created at the consortium level and exposed as a KP.
  12. Components that do not fulfill the responsibilities of KPs and ARAs can still be stand-alone elements of the architecture to provide particular functionality; such tools will use the Translator ReasonerAPI whenever possible.
  13. Answer persistence will be the responsibility of the ARS.
  14. A system-wide UI will (eventually) exist, and will allow users to interpret answers, and reformulate questions.
  15. The SmartAPI registry will serve as a Translator Registry, and will expose programmatically accessible metadata about KPs and ARAs.
    1. All REST-Style SmartAPI KPs must be registered in the Translator Registry.
    2. All Translator Reasoner API KPs must be registered in the Translator Registry. All metadata for Translator Reasoner APIs must be available via endpoints at the service, from which it will be extracted by the SmartAPI Registry.
    3. All KGX files intended for graph transfer must be registered in the Translator Registry. All metadata for KGX files must be contained in associated metadata files and exposed via an API, which will be consumed by the SmartAPI Registry.
    4. All ARA must be registered in the Translator Registry. The ARS will not require a separate registration.
    5. Each type of component must provide the metadata described here
    6. Non-KP, Non-ARA components, such as normalizers, must also be registered and provide metadata appropriate to their API type.
    7. The SmartAPI Registry will provide a unified query system, returning information about all three API methods. This query system will allow ARAs to locate the appropriate KPs.
    8. SRI will guarantee that metadata standards across the components allow such a unified query system.
    9. The SmartAPI registry will allow components to find all KPs by querying for biolink predicates. The SmartAPI registry will allow components to query by predicate from biolink-understood vocabularies, and return KPs that provide such metadata.
  16. A continuous integration framework will consume metadata from the registry, and provide automated testing and reports.
  17. Both KPs and ARAs should acquire and transmit provenance information to the fullest possible extent.

Diagram

image

image