Plot base level coverage
Closed this issue · 9 comments
I've received feedback from users interested in being able to plot the base-level coverage for a set of samples in recount2.
One way of doing this is via derfinderPlot::plotRegionCoverage()
. I could write a function where the user specifies:
- (1) a list of sample ids (
run
for SRA and GTEx,gdc_file_id
for TCGA), - (2) a GRanges with the regions of interest (which would have to be short -- or the user would be warned) and
- (3) colors and labels for each sample (ideally representing different groups).
Then internally run derfinder::loadCoverage()
accessing the data from either local bigwigs (if at JHPCE, SciServer or if the user downloaded them) or via the web. Something that could be a bit of a bottle neck is the annotation needed for derfinderPlot::plotRegionCoverage()
. That would normally involve creating a genomic state object from Gencode v25 hg38 which takes time even for one chromosome. I could pre-make them split by chr and have the function download the files from the web behind the scenes. In any case, right now this approach would have to deal with
rafalab/bumphunter#15 since derfinderPlot::plotRegionCoverage()
requires the output of bumphunter::matchGenes()
.
Another option would be to open the data in the UCSC genome browser or something like that.
A third option could be to use epivizr
http://bioconductor.org/packages/release/bioc/vignettes/epivizr/inst/doc/IntroToEpivizr.html. Although I'm not sure that it can read bigwig files from the web.
Thoughts @andrewejaffe @jtleek ?
Right now, users can get the list of URLs from recount::recount_url
and load them into IGV or similar programs.
I think we should talk to Hector Corrada Bravo about getting epivzr to do this - I don't think we should write a function here. I think we should write a tutorial. @lcolladotor
I'm going to see what I can do with https://twitter.com/hcorrada/status/844896126578970625.
Here's what I could get working with epivizr
:
Code at https://gist.github.com/lcolladotor/d02064583733754dea7828a984378c96.
Some notes:
- The
epivizr
window on Google Chrome didn't show the menus well. - The print function to save a screenshot in pdf format didn't work (only tried it from Firefox).
- The default gene annotation in
epivizr
seems to be from hg19. - If the user moves the window, they'll have to re-import the coverage data and update the charts in
epivizr
.
Ahh, I forgot to mention that I couldn't get this to work:
status <- colData(rse)$group
index <- split(seq(along = status), status)
logCounts <- log2(assays(rse)$counts + 1)
means <- sapply(index, function(ind) rowMeans(logCounts[, ind]))
rse_viz <- SummarizedExperiment(means, rowRanges = rowRanges(rse))
mean_plot <- app$plot(rse_viz, datasource_name = 'log 2 mean counts', columns = c('induced', 'uninduced'))
It's based on the epivizr
vignette code.
Hi,
Here is another option, using the UCSC genome browser. Note that the bigwig files in this case are not normalized to the same library size. So the visual comparison is not ideal. It's possible to do so, but that involves making new bigwig files with wiggletools
or similar tools. The upside is that it's easy to change the window range. For example, here is a screenshot after I clicked on "zoom out 10x". Also, the user doesn't have to download any base-pair data from recount2
.
The code is at https://gist.github.com/lcolladotor/c022472ca2fd7199e13fe61d0765096c and I used the options specified at https://ycl6.gitbooks.io/rna-seq-data-analysis/visualization.html for building the tracks file. Notably smoothingWindow=4
.
If there are too many samples in a project, another option is to visualize the mean. We do have mean bigwig files (normalized to the same library size), so comparing data from different projects would be easy with either the UCSC genome browser or with epivizr
.
Best,
Leo
http://bioconductor.org/packages/devel/bioc/html/wiggleplotr.html looks interesting (noticed via https://twitter.com/seandavis12/status/847628575339536386). From http://bioconductor.org/packages/devel/bioc/vignettes/wiggleplotr/inst/doc/wiggleplotr.html it looks like you can specify a scaling factor.
Thanks @lcolladotor for linking me to this conversation. I think it would be a great application for wiggleplotr. Internally, wiggleplotr uses rtracklayer::BigWigSelection and rtracklayer::import.bw functions to import read coverage from bigWig files (see the readCoverageFromBigWig function in here: https://github.com/kauralasoo/wiggleplotr/blob/master/R/functions.R). If rtracklayer support remote bigBig files then they should also work with wiggleplotr.
Let me know if you have any questions regarding the usage of wiggleplotr and I'll be happy to help. I'm travelling until Tuesday but will have more time after that.
The tutorial using derfinderPlot
is part of https://github.com/LieberInstitute/recountWorkflow. I mentioned wiggleplotr and epivizR as other options. Showing how to use all the different tools could be a workflow itself, but for now that is out of the scope of the recount workflow I wrote. In any case, the workflow addresses this issue, so I'll close it for now.