Load file from private Google Cloud bucket
DevangThakkar opened this issue · 4 comments
Hi,
I was wondering if it is possible to load a BAM file from a Google Cloud bucket. I tried loading a public BAM (example code with only the BAM location replaced) and that didn't seem to work. I understand that igv.js is able to load private Google cloud storage if we provide it with the requisite credentials - would it be possible to extend that to igv-reports as well?
> create_report test/data/variants/variants.vcf.gz \
http://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/hg38/hg38.fa \
--ideogram test/data/hg38/cytoBandIdeo.txt \
--flanking 1000 --info-columns GENE TISSUE TUMOR COSMIC_ID GENE SOMATIC \
--samples reads_1_fastq --sample-columns DP GQ \
--tracks test/data/variants/variants.vcf.gz gs://genomics-public-data/NA12878.chr20.sample.bam test/data/hg38/refGene.txt.gz \
--output examples/example_vcf.html
[E::hts_open_format] Failed to open file gs://genomics-public-data/NA12878.chr20.sample.bam
Traceback (most recent call last):
File "/usr/local/bin/create_report", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.6/dist-packages/igv_reports/report.py", line 345, in main
create_report(args)
File "/usr/local/lib/python3.6/dist-packages/igv_reports/report.py", line 84, in create_report
reader = utils.getreader(config, None, args.fasta)
File "/usr/local/lib/python3.6/dist-packages/igv_reports/utils.py", line 13, in getreader
return bam.BamReader(path)
File "/usr/local/lib/python3.6/dist-packages/igv_reports/bam.py", line 11, in __init__
header = pysam.view(*args)
File "/usr/local/lib/python3.6/dist-packages/pysam/utils.py", line 75, in __call__
stderr))
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools view: failed to open "gs://genomics-public-data/NA12878.chr20.sample.bam" for reading: Protocol not supported\n'
@DevangThakkar Can you do this in python using pysam? If you can, igv-reports could be modified to do this. See the file igv_reports/bam.py, this is where alignments are read.
"gs" protocol will likely not be recognized by pysam, however the mapping of "gs" -> "https" protocol is a simple matter of parsing bucket and object name from the gs: url, then adding the parameter "alt=media" . In javascript this looks like
`https://storage.googleapis.com/storage/v1/b/${bucket}/o/${object}?alt=media`
@DevangThakkar Have you had a chance to experiment with pysam? The gs -> https mapping is trivial, the challenge here is doing oAuth in python. I'm curious what you have in mind here for "passing credentials", you cannot of course just pass a username and password. Did you have in mind an access token? I'm not sure how you would do that securely.
Ahh yes, perfect.