/cam-pipeline

Data loading pipeline for CAM database

Primary LanguageJupyter NotebookMIT LicenseMIT

AOP-CAM Knowledge Provider database pipeline

This repository contains a makefile and resources for constructing the RDF triplestore used in the AOP-CAM Knowledge Provider prototype for NCATS Data Translator. Source code for the web service API used to query this data can be found at TranslatorIIPrototypes/cam-api. The web service API can be tested from within its Swagger API documentation. The current version of this database can queried with SPARQL using the endpoint https://stars-app.renci.org/cam/sparql.

Development of the prototype is led by Jim Balhoff (RENCI, UNC-Chapel Hill) and Stephen Edwards (RTI International), funded by the National Center for Advancing Translational Sciences.

Data sources

Ontologies

This resources integrates a broad set of ontologies developed as part of the OBO Library. The complete list can be see in the OWL imports declarations within https://github.com/TranslatorIIPrototypes/cam-pipeline/blob/master/ontologies.ofn.

Data

The data sets integrated into the database consist of independent OWL instance models making use of the terms and relations defined within the combined ontology (as described in 'Gene Ontology Causal Activity Modeling (GO-CAM) moves beyond GO annotations to structured descriptions of biological functions and systems'). Currently three main types of data are included:

Building the database

Prerequisites

Running

  • First ensure enough memory is available for all the commands:
       export JVM_ARGS=-Xmx256G
       export ROBOT_JAVA_ARGS=-Xmx256G
       export JAVA_OPTS=-Xmx256G
  • Run make cam-db-reasoned.jnl
  • Wait a few days...