Integration of MRI and RNAseq data in the Imagene Project

Integration of MRI and RNAseq data of the Imagene project. Paper published in Radiology: "Radiogenomic analysis of breast cancer by linking MRI phenotypes with tumor gene expression." https://doi.org/10.1148/radiol.2020191453.

To make a virtual environment for R and Python usage:

make requirements

Then you can process the data, train models and run analyses:

make all

Remote files

Data is not included in this repository. The download method and location of the data can be specified in the config/snakemake.yaml file.

Pathway Analysis Requirements

The pathway analyses requires the msigdb files release 5.2. These cannot be downloaded automatically, as registration is required. Before running the pathway analysis, download msigdb_v5.2_files_to_download_locally.zip from the msigdb site under Archived Releases and place it into data/external/msigdb. The scripts will extract the required files from the archive.

Project Organization

├── LICENSE
│
├── Makefile           <- Makefile with commands like `make all` and `make venv`.
│
├── Snakefile          <- Snakemake file with rules to run this project. These rules
│                         should be run inside a virtual environment containing the packages
│                         in requirements.txt (Python) and requirement.R (R). The Makefile can
│                         set this environment up on some systems.
│
├── README.md          <- The top-level README for developers using this project.
│
├── docs               <- Project Documentation using Sphinx.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── config/snakemake.yaml <- Configuration of the workflow. Set version of Python and R used here.
│
├── data
│   ├── raw            <- The original, immutable data dump.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   ├── external       <- Data from third party sources, such as reference data.
│   └── to_share       <- Data for third parties.
│
├── notebooks          <- Jupyter notebooks for exploration. Naming convention is date (for
│                         ordering), the creator's initials, and a short `-` delimited
│                         description, e.g. `2017-01-01-tb-initial-data-exploration`.
│
├── models             <- Trained models and model predictions.
│
├── analyses           <- Results and summaries of statistical analyses.
│
├── reports            <- Analysis reports, such as pweave or rmarkdown reports.
│   └── figures        <- Generated graphics and figures to be used in reporting.
├── figures            <- Figures for in the manuscript.
│
├── venv/              <- Suggested directory for virtual environment
│
├── requirements.txt   <- The Python requirements file for reproducing the analysis
│                         environment e.g. generated with `pip freeze > requirements.txt`.
├── requirements.R     <- Requirements and installation script for R.
│
└── src                <- Source code for use in this project.
    │
    ├── plot.py        <- Plotting library.
    ├── util.py        <- Utility function library.
    │
    ├── analysis       <- Scripts for statistical analysis of data or models.
    │
    ├── data           <- Scripts to download or generate data.
    │
    ├── features       <- Scripts to construct features from data.
    │
    ├── models         <- Scripts to train and apply models.
    │
    ├── reports        <- Source of reports, such as snippets used in multiple reports.
    │
    └── visualization  <- Scripts to visualize results.

Project based on the cookiecutter data science project template. #cookiecutterdatascience

NKI-CCB/imagene-analysis

Integration of MRI and RNAseq data in the Imagene Project

Remote files

Pathway Analysis Requirements

Project Organization