/TCGA_benchmarking_dockers

OpenEBench TCGA benchmarking Docker declarations

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

TCGA benchmarking Docker declarations

OpenEBench TCGA benchmarking Docker declarations, which define the architecture of benchmarking workflows to be implemented in OpenEBench.

NOTE for developers. In order to make the workflow containers reproducible and stable in the long-term, make sure to use specific versions in the container base image (e.g.ubuntu:16.04, NOT ubuntu:latest).

Structure

Our benchmarking workflow structure is composed by three docker images / steps:

  1. Validation: the input file format is checked and, if required, the content of the file is validated. The validation generates a participant dataset. In order to create datasets with structure compatible with the Elixir Benchmarking Data Model, please use the following python module and JSON schema
  2. Metrics_computation: the predictions are compared with the 'Gold Standards' provided by the community, which, in this case, results in two performance metrics - precision (Positive Predictive Value) and recall(True Positive Rate). Those metrics are written into assessment datasets. In order to create datasets with structure compatible with the Elixir Benchmarking Data Model, please use the following python module and JSON schema
  3. Consolidation: the benchmark itself is performed by merging the assessment metrics with the rest of TCGA data. The results are provided SVG format - scatter plot, and JSON format - aggregation/summary datasets, which are also compatible with the Elixir Benchmarking Data Model.

Find more information about the TCGA Cancer Drivers Pipeline here.

TCGA sample files

Usage

In order to build the Docker images locally, please run ./build.sh 1.0.3