kharchenkolab/dropEst

10X v3 Unable to parse UMI

namitc opened this issue · 4 comments

HI,

I'm trying to run the dropest pipline on a 10X dataset but it's giving me a "Unable to parse UMI in " error. Below is the command, BAM and the config file that I'm using.

./dropest -m -V -b -o sample3_dropest -g ~/rnaseq/refdata-cellranger-GRCh38-3.0.0/genes/genes.gtf -L eiEIBA -c configs/10x.xml ~/rnaseq/sample3/outs/possorted_genome_bam.bam

A00524:70:HHF7HDRXX:2:1234:20238:34037 256 1 12067 0 91M * 0 0 GGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATCTTCTGGCCCCTGTTGTCTGCATGTAACTTAATACCA FF:FFFFFF:FFFFFFFFFFFFFFFFFFF::FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFF NH:i:7 HI:i:2 AS:i:89 nM:i:0 RE:A:I li:i:0 BC:Z:TTGGCATA QT:Z:F:F,,,,F CR:Z:TACATTCTGCAACACT CY:Z:FFFFFFFFFFFFFFFF UR:Z:GGGGACACTCAC UY:Z:FFFFFFFFFFFF UB:Z:GGGGACACTCAC RG:Z:sample3:0:1:HHF7HDRXX:2
What am I doing wrong?
10x.txt

Got the same problem. I guess they didn't provide config.xml file for 10x V3.

JFYI, I am using the following 10x_v3.xml to process all my 10x v3 datasets which worked fine for me. I obtained it somewhere in the issue threads of this github.

<config>
    <!-- droptag -->
    <TagsSearch>
        <protocol>10x</protocol>
        <BarcodesSearch>
            <barcode1_length>8</barcode1_length>
            <barcode2_length>16</barcode2_length>
            <umi_length>12</umi_length>
            <r1_rc_length>0</r1_rc_length>
        </BarcodesSearch>

        <Processing>
            <min_align_length>10</min_align_length>
            <reads_per_out_file>10000000</reads_per_out_file>
            <poly_a_tail>AAAAAAAA</poly_a_tail>
        </Processing>
    </TagsSearch>

    <!-- dropest -->
    <Estimation>
        <Merge>
            <barcodes_file>../data/barcodes/10x_aug_2016_split</barcodes_file>
            <barcodes_type>const</barcodes_type>
            <min_merge_fraction>0.2</min_merge_fraction>
            <max_cb_merge_edit_distance>2</max_cb_merge_edit_distance>
            <max_umi_merge_edit_distance>1</max_umi_merge_edit_distance>
            <min_genes_after_merge>100</min_genes_after_merge>
            <min_genes_before_merge>20</min_genes_before_merge>
        </Merge>

        <PreciseMerge>
            <max_merge_prob>1e-5</max_merge_prob>
            <max_real_merge_prob>1e-7</max_real_merge_prob>
        </PreciseMerge>
        <BamTags> <!-- Optional. Tags, which are used to parse .bam file (-f option) or to print tagged .bam file (-b or -F options). Default values correspond to 10x protocol. -->
            <cb>CB</cb> <!-- Cell barcode. Default: CB. -->
            <cb_raw>CR</cb_raw> <!-- Cell barcode raw. Used only for bam output. Default: CR. -->
            <umi>UB</umi> <!-- UMI. Default: UB. -->
            <umi_raw>UR</umi_raw> <!-- UMI raw. Used only for bam output. Default: UR. -->
            <gene>GX</gene> <!-- Gene id. Default: GX. -->
            <cb_quality>CQ</cb_quality> <!-- Cell barcode quality. Default: CQ. -->
            <umi_quality>UQ</umi_quality> <!-- UMI quality. Default: UQ. -->
            <Type> <!-- Tag, which contain type of read. If not specified, all reads with gene info are considered as exonic -->
                <tag>XF</tag>
                <intronic>INTRONIC</intronic> <!-- Value corresponding to intronic reads. Default value for bam output is INTRONIC. -->
                <intergenic>INTERGENIC</intergenic> <!-- Value corresponding to intergenic reads. All reads, which has gene id and intergenic mark are considered as intergenic. Default value for bam output is INTERGENIC. -->
                <exonic>EXONIC</exonic> <!-- Value corresponding to exonic reads. If not specified, all reads with other tags, which has gene id are considered as exonic. Default value for bam output is EXONIC. -->
            </Type>
        </BamTags>
    </Estimation>
</config>

Looks like the only difference between the 10x.xml and 10x_v3.xml is to change the following:

diff 10x.xml 10x_v3.xml
8c8
<             <umi_length>10</umi_length>
---
>             <umi_length>12</umi_length>

I confirm I am having this issue again. I am not sure what changed. Previously, correcting the umi_length worked, now, I am getting the same error despite the correction. Any pointers on how to fix would be greatly appreciated.