/3d-reorganization-prostate-cancer

Code, analysis, and results for Hawley, Zhou, et al., Cancer Research, 2021.

Primary LanguageHTMLGNU General Public License v3.0GPL-3.0

Reorganization of the 3D genome pinpoints non-coding drivers of primary prostate tumors

This repository contains all the data and analysis related to Reorganization of the 3D genome pinpoints non-coding drivers of primary prostate tumors.

Published version of the paper is available on Cancer Res.

A preprint version of this article is available on bioRxiv.

A reproducible run of this work can be found on CodeOcean.

Usage

To download all the code, scripts, and results, use git clone:

git clone https://github.com/LupienLab/3d-reorganization-prostate-cancer.git

This does not download the raw sequencing data. There are placeholder folders for the raw data, but the FASTQ files are available from the European Genome-Phenome Archive.

Data Type EGA Accession Number
Whole genome sequencing EGAS00001000900
RNA-seq EGAS00001000900
ChIP-seq (H3K27ac) EGAS00001002496
Hi-C EGAS00001005014

Processed data from the Hi-C sequencing data can be found on the Gene Expression Omnibus (Accession GSE164347).

Raw Hi-C data from other studies can be found with the links and accession numbers below.

Data Repository Accession Number
22Rv1, RWPE1, and C4-2B GEO GSE118629
H1-hESC (Rep 1) 4D Nucleome 4DNFI6HDY7WZ
H1-hESC (Rep 2) 4D Nucleome 4DNFITH978XV
HAP-1 (Rep 1) 4D Nucleome 4DNFIT64Q7A3
HAP-1 (Rep 2) 4D Nucleome 4DNFINSKEZND
GM12878 (Rep 1) 4D Nucleome 4DNFIIV4M7TF
GM12878 (Rep 2) 4D Nucleome 4DNFIXVAKX9Q

Project Structure

This repository is structured as follows:

.
└── data/            # directory where all non-analysis data is stored
    ├── External/    # data from other papers, collaborators
    ├── Raw/         # raw data generated for this specific project along with pre-processing scripts and data
    └── Processed/   # data from `Raw/` that has been aggregated or processed in some other way beyond the standard raw pre-processing
└── code/
    ├── Result1/     # analysis scripts and logs for `result1`
    ├── Result2/     # analysis scripts and logs for `result2`
    └── ...
└── results/
    ├── Result1/     # results for `result1`
    ├── Result2/     # results for `result2`
    └── ...
├── README.md        # this file
└── environment.yaml # Anaconda environment YAML file for the entire project

To re-run any of the analyses in the code/ folders:

  1. Build and activate the conda environment stored in environment.yaml
    conda create --file environment.yaml -n <ENV_NAME>
    conda activate <ENV_NAME>
  2. Navigate to the result directory of interest
  3. Run snakemake

That should regenerate the entire set of results for that specific folder. You can preview that needs to be run by running snakemake -n before running the analyses.