YosefLab/scone

Cell Ranger alignment stats

Opened this issue · 3 comments

Hi there,
I have some 10X data that I'd like to try scone on. I'm trying to find alignment QC metrics in the Cell Ranger output files. Does Cell Ranger output the alignment QC metrics you reported in Table S2 of the paper (i.e., unmapped_reads, umi_corrected, etc.) ?

Thanks,
ian

Hi @iwilliams91 ,

I only have a vague recollection of what we did, but if I remember correctly we had to extract the metrics from the cell ranger output in a non-obvious location.

@mbcole performed the analysis and might remember more?

@iwillham were you able to find the answer to your question? I am stuck at the same.

@iwillham and @asmariyaz23 Not sure if you are still looking for a solution here, but I've made a little progress with this. So far I've been able to find 1) unmapped_reads 2) num_reads

You can get the complete list of mapped reads from your .bam file using samtools. (Note, I'm using Unix commands to find these barcodes. I think you can find equivalent commands for mac or PC.)

samtools view possorted_genome_bam.bam | awk '
match($0,/CB:Z:[ACGT]*/) {
a[substr($0,RSTART+5,RLENGTH-5)]++
}
END {
for(i in a)
print i,a[i]
}' >> /mapped_reads_per_barcode

output of the first 10 lines
GAAACTCTCGCAAACT | 14
ACATACGTCTCATTCA | 7
GATCGCGAGAACAATC | 4
CACACTCAGAAGGTGA | 18
TGCACCTAGTCCGGTC | 22889
GGACATTAGGATGTAT | 9
GACCAATCACATTCGA | 1
GAACCTATCAGAAATG | 6
AGCTCTCGTACACCGC | 13
CACAGTAAGCGCCTCA | 1043

You can then subset this barcode count list with the verified barcodes from cell ranger

For unmapped reads, replace the first command line with: 'samtools view -f 4 possorted_genome_bam.bam'

num_reads would then just be the two tables trimmed, ordered, and summed.

@mbcole Does that sound right?