/CA2M_v2

CA2M: chromatin accessibility and mutations in cancer genomes

Primary LanguageR

Predicting regional mutation burden in cancer genomes using chromatin accessibility (CA) and replication timing (RT)

This repository includes source code, tutorials, and processed datasets for the study:

Chromatin accessibility of primary human cancers ties regional mutational processes and signatures with tissues of origin .

Oliver Ocsenas and Jüri Reimand (2022) in revision.

Tutorials - Jupyter notebooks

  • 1_BigWigtoWindow.ipynb - mapping chromatin signals to megabase-scale windows
  • 2_MAFtoWindow.ipynb - mapping cancer mutations to megabase-scale windows
  • 3_CA2M_RF.ipynb - random forest models of megabase-scale mutation burden, chromatin accessibility and replication timing
  • 4_CA2M_RF_FeatureSelection_Tutorial.ipynb - selecting significant features predicting mutation rates
  • 5_CA2M_RF_SHAPscores.ipynb - computing feature importance scores (SHAP)
  • 6_CA2M_RF_EnrichedMutations_Tutorial.ipynb - detecting genomic regions with enriched mutations that are not explained by chromatin and replication timing alone

Tutorials/data - files needed for tutorials

  • All_CA_RT_100KB_scale.csv.gz - CA and RT tracks for cancer and normal samples, 100-kbps resolution
  • All_CA_RT_1MB_scale.csv.gz - CA and RT tracks for cancer and normal samples, 1-Mbps resolution
  • NormalCA_RT_MBscale.csv.gz - CA and RT tracks for normal tissues and cell lines, 1-Mbps resolution
  • PCAWG_SNVbinned_100KB_scale.csv.gz
  • PCAWG_SNVbinned_MBscale.csv.gz - mutation burden in whole cancer genomes, 1-Mbps resolution
  • PCAWG_breastcancer_SNV.MAF.gz - example file of somatic mutations in breast cancer for creating files above
  • SHAP_plot.pdf - example plot of feature importance scores (SHAP)
  • TCGA_BRCA_ATACSeq_chr1_2.bw - example file of chromatin accessibility in breast cancer for creating files above (chrs 1-2 only)
  • TumorCA_RT_MBscale.csv.gz - CA and RT tracks for cancer samples, 1-Mbps resolution

All_code - entire code repository for the project; use on your own responsibility

Contact: oocsenas [@] oicr.on.ca ; juri.reimand [@] utoronto.ca