/awesome-TCGA

Curated list of TCGA resources

GNU General Public License v3.0GPL-3.0

Awesome TCGA Awesome

Curated list of awesome resources to access data from The Cancer Genome Atlas (TCGA) project, with a particular focus on computational tools allowing pan-TCGA analysis and/or giving access to the results of such tools.

Official links

General informations

Data repositories

Downloading the data

List of command line tools, API or R packages to download the data.

Official tools

Broad Institute GDAC

The Broad TCGA Data and Analyses (Broad GDAC) Firehose provides TCGA Level 3 data and Level 4 analyses packaged in a form amenable to immediate algorithmic analysis. This is a useful resource to access analyses results not performed by the GDC (e.g. MutSig2CV, correlation with clinical variables, mRNA clustering etc.). They are automatically running in a systematic way the software we usually see in a TCGA publication. However the data is currently based on the old hg19 TCGA data for somatic variant calling.

  • Firehose - Refers to the computational infrastructure.
  • Firebrowse - A web UI to visualise the results of the analyses performed by Firehose.

Others

The GDC hosts a list of such tools: https://gdc.cancer.gov/access-data/gdc-community-tools.

  • TCGABiolinks - A R/Bioconductor package to search, download and prepare relevant data for analysis in R. Very powerful and well documented.
  • GDC Spreadsheet Download Tool - Tool to download clinical and/or biospecimen metadata for a given set of files in a tab-delimited format.
  • GenomicDataCommons - A R/Bioconductor package for querying, accessing, and mining genomic datasets available from the GDC.
  • gdctools - Broad Institute Python and UNIX CLI utilities to simplify search and retrieval of open-access data from the GDC.

Cloud computing

List of cloud computing facilities hosting the TCGA data.

Pan-TCGA analyses

List of analyses performed in a consistent manner on all (or at least several) TCGA datasets, where the results are freely available.

  • Firehose - See above for the associated tools to download the data. They run many software on all TCGA cohorts and make the results available.
  • Tumor Fusion Gene Data Portal - 9,966 tumor samples from 33 TCGA cancer types and 689 normal samples in 19 TCGA normal tissue types were analyzed by PRADA pipeline and the realigned BAM files of RNAseq data.
  • DriverDBv2 - WES and RNA-seq reanalysis to identify driver genes. Provides a nice graphical summary of mutation clustering in genes (e.g. for TP53).
  • ChimerDB - A comprehensive database of fusion genes encompassing analysis of deep sequencing data (including TCGA) and manual curations.
  • ASCAT Ploidy and Purity Estimates - COSMIC hosts a tab separated table listing the ploidy and aberrant cell fraction (purity estimate), for TCGA samples re-analysed using ASCAT.
  • BioXpress - RNA-seq-derived gene expression database, including TCGA among others.

Publications