/examples-R

Analysis examples based on the ISB-CGC hosted TCGA data, using R and R Markdown.

Primary LanguageHTMLApache License 2.0Apache-2.0

examples-R

Analysis examples based on the ISB-CGC hosted TCGA data, using R and R Markdown.

To install:

library("devtools")
install_github("isb-cgc/examples-R", build_vignettes=TRUE)

To view and run the vignettes.

  library(ISBCGCExamples)
  help(package="ISBCGCExamples")

Alpha tables are no longer available!

Please move to the "tcga_201607_beta" dataset, or even better, the newest GDC datasets "TCGA_hg19_data_v0", "TCGA_hg38_data_v0", and "TCGA_bioclin_v0".

Some of these examples are using the alpha dataset that is now unavailble. If you see a dataset that begins with "tcga_201510_alpha", then try "tcga_201607_beta", and it's likely to work. We will be updating these over time.

If you want to move to the newest datasets (recommended), be aware that some of the most common column names have changed to match the GDC's schemas. For example, "Study" is now "project_short_name". "ParticipantBarcode" is not "case_barcode". "SampleBarcode" is now "sample_barcode". Overall the column names have become all lower case. Please get in touch if you're having trouble.

OAuth

If you are having trouble with the OAuth, see the OAuth section below!

vignettes

There are vignettes for each TCGA data type, and more elaborate examples involving analyzing genomic data, correlating gene expression and methylation, and correlating protein and mRNA levels.

The vignettes as R-markdown can be found in the examples-R/inst/doc directory, which can serve as examples of using builtin BigQuery functions like Pearson correlation, or even how to implement more complex functions like Spearmans correlation. Queries can be simple character vectors, or standalone files. Results are returned as data.frames using the bigrquery package to interact with the servers.

The SQL files used in the vignettes can be found at examples-R/inst/sql. These are parsed and dispatched with arguments using the DisplayAndDispatchQuery function, found in the file of the same name in examples-R/R.

Intro to the CGC

Big Query Introduction

TCGA Annotations

Creating TCGA cohorts part 1

Creating TCGA cohorts part 2

Using the API endpoints to working with barcode lists

Constructing small matrices

Available data types

microRNA expression

Copy Number segments

DNA Methylation

Protein expression

Somatic Mutations

mRNAseq gene expression

Advanced examples

DESeq2 workflow on raw data

Expression and Copy Number Correlation

Expression and Methylation Correlation

Expression and Protein Correlation

Genomic And Expression T-test

Using Docker

Processing Raw Data with Bioconductor

Bioconductor provides an excellent set of docker containers which include R, RStudio Server, and the sets of Bioconductor packages appropriate for certain use cases.

This R package is also available in a Docker container derived from bioconductor/release_core:

  b.gcr.io/isb-cgc-public-docker-images/r-examples

It can be run like so:

  docker run -p 8787:8787 -v YOUR_LOCAL_DIRECTORY:/home/rstudio/data \
    b.gcr.io/isb-cgc-public-docker-images/r-examples:latest

and then navigate to http://localhost:8787 on your local machine.

For more details, see examples-R/inst/docker and http://www.bioconductor.org/help/docker/.

Then log into Rstudio with username and password 'rstudio', for more details: https://github.com/rocker-org/rocker/wiki/Using-the-RStudio-image

OAuth

If you have trouble with the OAuth, see examples-R/inst/doc/BigQueryIntroduction.html for some instructions on resetting it.

#NOTE: There was an incompatibility between bigrquery and the httr library. If you are having trouble, try installing the development version of bigrquery or use the prior version of httr (1.0.0).

To install the dev version of bigrquery:

   https://github.com/rstats-db/bigrquery
   install.packages('devtools')
   devtools::install_github("rstats-db/bigrquery")