Analysis examples based on the ISB-CGC hosted TCGA data, using R and R Markdown.
To install:
library("devtools")
install_github("isb-cgc/examples-R", build_vignettes=TRUE)
To view and run the vignettes.
library(ISBCGCExamples)
help(package="ISBCGCExamples")
Please move to the "tcga_201607_beta" dataset, or even better, the newest GDC datasets "TCGA_hg19_data_v0", "TCGA_hg38_data_v0", and "TCGA_bioclin_v0".
Some of these examples are using the alpha dataset that is now unavailble. If you see a dataset that begins with "tcga_201510_alpha", then try "tcga_201607_beta", and it's likely to work. We will be updating these over time.
If you want to move to the newest datasets (recommended), be aware that some of the most common column names have changed to match the GDC's schemas. For example, "Study" is now "project_short_name". "ParticipantBarcode" is not "case_barcode". "SampleBarcode" is now "sample_barcode". Overall the column names have become all lower case. Please get in touch if you're having trouble.
If you are having trouble with the OAuth, see the OAuth section below!
To authenticate on a remote server, you need to use out-of-band authentication (OOB). The httr package has an option for this. Set "options(httr_oob_default=TRUE)" after loading bigrquery, but before calling query_exec(), and you should be good to go. [citation: https://support.rstudio.com/hc/en-us/articles/217952868-Generating-OAuth-tokens-from-a-server]
There are vignettes for each TCGA data type, and more elaborate examples involving analyzing genomic data, correlating gene expression and methylation, and correlating protein and mRNA levels.
The vignettes as R-markdown can be found in the examples-R/inst/doc directory, which can serve as examples of using builtin BigQuery functions like Pearson correlation, or even how to implement more complex functions like Spearmans correlation. Queries can be simple character vectors, or standalone files. Results are returned as data.frames using the bigrquery package to interact with the servers.
The SQL files used in the vignettes can be found at examples-R/inst/sql. These are parsed and dispatched with arguments using the DisplayAndDispatchQuery function, found in the file of the same name in examples-R/R.
Using the API endpoints to work with barcode lists
Expression and Copy Number Correlation
Expression and Methylation Correlation
Expression and Protein Correlation
Processing Raw Data with Bioconductor
Bioconductor provides an excellent set of docker containers which include R, RStudio Server, and the sets of Bioconductor packages appropriate for certain use cases.
This R package is also available in a Docker container derived from bioconductor/release_core
:
b.gcr.io/isb-cgc-public-docker-images/r-examples
It can be run like so:
docker run -p 8787:8787 -v YOUR_LOCAL_DIRECTORY:/home/rstudio/data \
b.gcr.io/isb-cgc-public-docker-images/r-examples:latest
and then navigate to http://localhost:8787 on your local machine.
For more details, see examples-R/inst/docker and http://www.bioconductor.org/help/docker/.
Then log into Rstudio with username and password 'rstudio', for more details: https://github.com/rocker-org/rocker/wiki/Using-the-RStudio-image
If you have trouble with the OAuth, see examples-R/inst/doc/BigQueryIntroduction.html for some instructions on resetting it.
There was an incompatibility between bigrquery and the httr library. If you are having trouble, try installing the development version of bigrquery or use the prior version of httr (1.0.0).
To install the dev version of bigrquery:
https://github.com/rstats-db/bigrquery
install.packages('devtools')
devtools::install_github("rstats-db/bigrquery")