IBVL

Repo organization

This repo is intended for Wasserman lab members working on data processing for IBVL.

The repo is organized in sub-folders depending on the different aspects of the data processing.

Metadata tracking

Concerning metadata tracking (tracking ofinformation associated with each sample).

One of the considered tool to track metadata is OpenCGA, refer to the openCGA folder for more information.

Nextflow Scripts

Concerning the scripts used to generate the IBVL.

The Nextflow wrapper is used to allow treacability and reproducibility, to review / comment the scripts, refer to the script folder

Import directory

How to run an import:

copy the import/.env-sample file to import/.env and set values appropriately
(optional) if you need to, run python tables.py to create the tables (database should be empty before this)
python orchestrate.py will kick off the migration

The script creates a directory called "jobs", and a directory inside that called "1" the first time, "2" the second time, eg.

Each of these job folders has working data for the migration and two output logs (one for errors, one for progress). The working data is just (for each model) a file with the latest primary key, and a reverse lookup map for entity id (eg gene or variant or transcript id) to primary key.

Import environment vars