extract reads specific to a barcode
Closed this issue · 3 comments
ersgupta commented
Hi,
Would it be possible have a utility to extract reads from a bam file for a specific BC+UMI and gene? This would help a lot if we need to dig deeper into certain cells etc.
Thanks
Saurabh
jamesnemesh commented
While we don’t have a tool explicitly designed to extract exactly what you’re looking for, you can get close with the following:
FilterBamByTag can filter to a single cell barcode or a set of barcodes. FilterBamByTag can also filter to a UMI sequence. You can then use a unix tool like grep to filter on a specific gene name. You can even pipe these operations together. Let’s say I wanted to look at gene A2M on cell barcode TTCTCCTTCACTATTC with UMI ACCCGGTCCG. I’d first make a BAM specific to the cell and UMI sequences:
FilterBamByTag I=12.bam O=/dev/stdout TAG=XC TAG_VALUE=TTCTCCTTCACTATTC COMPRESSION_LEVEL=0 | /broad/mccarroll/software/dropseq/priv/FilterBamByTag I=/dev/stdin TAG=XM TAG_VALUE=ACCCGGTCCG O=TTCTCCTTCACTATTC:ACCCGGTCCG.bam
Then you could look at that output bam with samtools or IGV:
samtools view TTCTCCTTCACTATTC:ACCCGGTCCG.bam |grep A2M
HCJWLDMXX:1:1452:31177:20917 16 12 9221359 255 80M902N18M * 0 0 TTCACTATGGCTGGTTTCAGATCTCTTACTGGGACATCTTGCAGAACCGTGAAGAACAAGCTCAGTGTCTGATTTGACACCTTATCAAGGTAAATCAA FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF XC:Z:TTCTCCTTCACTATTC MD:Z:98 GE:Z:A2M XF:Z:CODING PG:Z:STAR.I RG:Z:HCJWL.1.2 XG:Z:A2M NH:i:1 NM:i:0 XM:Z:ACCCGGTCCG UQ:i:0 AS:i:98 GS:Z:-
(This bam only has 21 reads, so it’s quite compact)
Typically when I want to look at exemplar data, I extract a BAM for a particular cell barcode, then look at the data in IGV and color reads by UMI sequence. You can pull a few cells and look at them side by side in IGV, and you can move freely about the genome to look at many genes.
…-Jim
On Dec 2, 2019, at 9:02 AM, Saurabh ***@***.***> wrote:
Hi,
Would it be possible have a utility to extract reads from a bam file for a specific BC+UMI and gene? This would help a lot if we need to dig deeper into certain cells etc.
Thanks
Saurabh
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#162?email_source=notifications&email_token=ABCZXJYYACRFGVOKM5DKRPDQWUIPFA5CNFSM4JTWIKUKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4H5IVRNA>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABCZXJZHKZTFBOK3BDIGLFDQWUIPFANCNFSM4JTWIKUA>.
ersgupta commented
Great. This would do. Would it be possible to incorporate ED=1 in this?
jamesnemesh commented
This extracts raw read data, so there’s no ED manipulations on the data, sorry.
…-Jim
On Dec 3, 2019, at 8:01 AM, Saurabh ***@***.***> wrote:
Great. This would do. Would it be possible to incorporate ED=1 in this?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub <#162?email_source=notifications&email_token=ABCZXJ2DQIUCOI63H44IA7LQWZKEDA5CNFSM4JTWIKUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEFZJHDA#issuecomment-561157004>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABCZXJ6L7BSW3O2PQ4NXOBLQWZKEDANCNFSM4JTWIKUA>.