Teichlab/tracer

KeyError message in assemble

micans opened this issue · 5 comments

Hi, below is the KeyError error that I get with tracer assemble. It is only one out of 1536 samples, so otherwise the software is running very nicely. Below the error I have copied a listing of the output files that were generated; in particular the fastq files are non-empty (they are uncompressed into the files f1 and f2). I notice that the 'overall alignment rate' looks low (0.12%), but it's similar in other samples. Is this something I need to worry about? Let me know if there is a way to dig further into this,
Thanks,
Stijn

Error

924078 reads; of these:
  924078 (100.00%) were paired; of these:
    923196 (99.90%) aligned concordantly 0 times
    882 (0.10%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    923196 pairs aligned concordantly 0 times; of these:
      22 (0.00%) aligned discordantly 1 time
    ----
    923174 pairs aligned 0 times concordantly or discordantly; of these:
      1846348 mates make up the pairs; of these:
        1845935 (99.98%) aligned 0 times
        321 (0.02%) aligned exactly 1 time
        92 (0.00%) aligned >1 times
0.12% overall alignment rate
924078 reads; of these:
  924078 (100.00%) were paired; of these:
    922260 (99.80%) aligned concordantly 0 times
    1818 (0.20%) aligned concordantly exactly 1 time
    0 (0.00%) aligned concordantly >1 times
    ----
    922260 pairs aligned concordantly 0 times; of these:
      6 (0.00%) aligned discordantly 1 time
    ----
    922254 pairs aligned 0 times concordantly or discordantly; of these:
      1844508 mates make up the pairs; of these:
        1843450 (99.94%) aligned 0 times
        863 (0.05%) aligned exactly 1 time
        195 (0.01%) aligned >1 times
0.25% overall alignment rate
Traceback (most recent call last):
  File "/usr/local/bin/tracer", line 11, in <module>
    load_entry_point('tracer==0.5', 'console_scripts', 'tracer')()
  File "/usr/local/lib/python3.5/dist-packages/tracer-0.5-py3.5.egg/tracerlib/launcher.py", line 43, in launch
    Task().run()
  File "/usr/local/lib/python3.5/dist-packages/tracer-0.5-py3.5.egg/tracerlib/tasks.py", line 372, in run
    cell = self.ig_blast()
  File "/usr/local/lib/python3.5/dist-packages/tracer-0.5-py3.5.egg/tracerlib/tasks.py", line 515, in ig_blast
    self.max_junc_len)
  File "/usr/local/lib/python3.5/dist-packages/tracer-0.5-py3.5.egg/tracerlib/io.py", line 113, in parse_IgBLAST
    loci, max_junc_len)
  File "/usr/local/lib/python3.5/dist-packages/tracer-0.5-py3.5.egg/tracerlib/tracer_func.py", line 213, in find_possible_alignments
    query_name, species, loci_for_segments)
  File "/usr/local/lib/python3.5/dist-packages/tracer-0.5-py3.5.egg/tracerlib/tracer_func.py", line 382, in get_fasta_line_for_contig_imgt
    best_V_seq = IMGT_seqs[V_locus_key][segment]
KeyError: 'TRAV29_DV5*03'

Directory structure:

root@robin:/workspace/svd/work2/7d/140d7c90d67b9c1900e3ee491987da# find -maxdepth 3 -exec ls -l {} \;
total 336591
lrwxrwxrwx 1 root root        55 May 13 12:50 Human_colon_16S7722453_1.fastq.gz -> /workspace/svd/fastqs/Human_colon_16S7722453_1.fastq.gz
lrwxrwxrwx 1 root root        55 May 13 12:50 Human_colon_16S7722453_2.fastq.gz -> /workspace/svd/fastqs/Human_colon_16S7722453_2.fastq.gz
-rw-r--r-- 1 root root 172334567 May 13 12:50 f1
-rw-r--r-- 1 root root 172334567 May 13 12:50 f2
drwxr-xr-x 3 root root        48 May 13 12:50 out_asm
-rw-r--r-- 1 root root 235 May 13 12:50 ./.command.sh
-rw-r--r-- 1 root root 3484 May 13 12:50 ./.command.stub
-rw-r--r-- 1 root root 2274 May 13 12:50 ./.command.run
-rw-r--r-- 1 root root 717 May 13 12:50 ./.command.yaml
-rw-r--r-- 1 root root 0 May 13 12:50 ./.command.begin
lrwxrwxrwx 1 root root 55 May 13 12:50 ./Human_colon_16S7722453_1.fastq.gz -> /workspace/svd/fastqs/Human_colon_16S7722453_1.fastq.gz
lrwxrwxrwx 1 root root 55 May 13 12:50 ./Human_colon_16S7722453_2.fastq.gz -> /workspace/svd/fastqs/Human_colon_16S7722453_2.fastq.gz
-rw-r--r-- 1 root root 24373 May 13 12:50 ./.command.out
-rw-r--r-- 1 root root 2283 May 13 12:50 ./.command.err
-rw-r--r-- 1 root root 194 May 13 12:51 ./.command.trace
-rw-r--r-- 1 root root 172334567 May 13 12:50 ./f1
-rw-r--r-- 1 root root 172334567 May 13 12:50 ./f2
total 0
drwxr-xr-x 8 root root 184 May 13 12:50 out-Human_colon_16S7722453
total 4
drwxr-xr-x 2 root root  240 May 13 14:44 IgBLAST_output
drwxr-xr-x 2 root root  218 May 13 12:50 Trinity_output
drwxr-xr-x 2 root root 4096 May 13 12:50 aligned_reads
drwxr-xr-x 2 root root   10 May 13 12:50 expression_quantification
drwxr-xr-x 2 root root   10 May 13 12:50 filtered_TCR_seqs
drwxr-xr-x 2 root root   10 May 13 12:50 unfiltered_TCR_seqs
total 3499
-rw-r--r-- 1 root root  951335 May 13 12:50 out-Human_colon_16S7722453_TCR_A.sam
-rw-r--r-- 1 root root  177160 May 13 12:50 out-Human_colon_16S7722453_TCR_A_1.fastq
-rw-r--r-- 1 root root  177160 May 13 12:50 out-Human_colon_16S7722453_TCR_A_2.fastq
-rw-r--r-- 1 root root 1558060 May 13 12:50 out-Human_colon_16S7722453_TCR_B.sam
-rw-r--r-- 1 root root  358400 May 13 12:50 out-Human_colon_16S7722453_TCR_B_1.fastq
-rw-r--r-- 1 root root  358400 May 13 12:50 out-Human_colon_16S7722453_TCR_B_2.fastq
total 10
-rw-r--r-- 1 root root 5268 May 13 12:50 out-Human_colon_16S7722453_TCR_A.Trinity.fasta
-rw-r--r-- 1 root root 3261 May 13 12:50 out-Human_colon_16S7722453_TCR_B.Trinity.fasta
-rw-r--r-- 1 root root   51 May 13 12:50 successful_trinity_assemblies.txt
-rw-r--r-- 1 root root    0 May 13 12:50 unsuccessful_trinity_assemblies.txt
total 90
-rw-r--r-- 1 root root 15495 May 13 12:50 out-Human_colon_16S7722453_TCR_A.IgBLASTOut
-rw-r--r-- 1 root root 41591 May 13 12:50 out-Human_colon_16S7722453_TCR_A_fmt3.IgBLASTOut
-rw-r--r-- 1 root root  9746 May 13 12:50 out-Human_colon_16S7722453_TCR_B.IgBLASTOut
-rw-r--r-- 1 root root 23827 May 13 12:50 out-Human_colon_16S7722453_TCR_B_fmt3.IgBLASTOut

Hi Mike,
thanks for the quick response and the suggestion to use -m assembly. I realised I missed out some bits; I'm using essentially the docker container teichlab/tracer with procps added. The command I run is this: tracer assemble -p 4 -s Hsap f1 f2 out-Human_colon_16S7722453 out_asm. It all works for the other 1535 samples (I just run the software by the way, I'm not the scientist). That said, I am a bit curious about the alignment rate: A random sample among those 1536 samples is below; is this roughly within the expected/acceptable range? If that's the case we'll be very happy!
Many thanks,
Stijn

./83/9ea943ed0fe638dcc2eddedd6b98b0/.command.err:0.20% overall alignment rate
./83/a011e1467d6d7613817e33c04df70a/.command.err:0.09% overall alignment rate
./83/a011e1467d6d7613817e33c04df70a/.command.err:0.15% overall alignment rate
./83/5b84fe14081acb64b8a7d750774663/.command.err:0.08% overall alignment rate
./83/5b84fe14081acb64b8a7d750774663/.command.err:0.09% overall alignment rate
./83/09e9fbfd1e15f7340b5f9f395cb5a2/.command.err:0.04% overall alignment rate
./83/09e9fbfd1e15f7340b5f9f395cb5a2/.command.err:0.16% overall alignment rate
./83/8f2f56732833a2e03d79ab7c140f7a/.command.err:0.05% overall alignment rate
./83/8f2f56732833a2e03d79ab7c140f7a/.command.err:0.13% overall alignment rate
./e9/8bfe761f4ab3e5a84306578261a2be/.command.err:0.09% overall alignment rate
./e9/8bfe761f4ab3e5a84306578261a2be/.command.err:0.17% overall alignment rate
./e9/913649ac6df02441b755a6d50026a2/.command.err:0.00% overall alignment rate
./e9/913649ac6df02441b755a6d50026a2/.command.err:0.01% overall alignment rate
./e9/91b84f6e97c558967ce95a176c06ba/.command.err:0.06% overall alignment rate
./e9/91b84f6e97c558967ce95a176c06ba/.command.err:0.17% overall alignment rate
./e9/65cfc6a310fc111fe4c7414669c04a/.command.err:0.08% overall alignment rate
./e9/65cfc6a310fc111fe4c7414669c04a/.command.err:0.05% overall alignment rate
./e9/61b455fe01d14a6a83952f5921ef24/.command.err:0.02% overall alignment rate
./e9/61b455fe01d14a6a83952f5921ef24/.command.err:0.14% overall alignment rate
./e9/d4ecfeaa2fc8bdc8a7baaff1203216/.command.err:0.03% overall alignment rate
./e9/d4ecfeaa2fc8bdc8a7baaff1203216/.command.err:0.09% overall alignment rate
./e9/de46e46aafd7bd876557461950c2fb/.command.err:0.03% overall alignment rate
./e9/de46e46aafd7bd876557461950c2fb/.command.err:0.26% overall alignment rate
./e9/e538aa7cb1f52022f17063961115b8/.command.err:0.07% overall alignment rate
./e9/e538aa7cb1f52022f17063961115b8/.command.err:0.21% overall alignment rate
./e9/1dca45b0f774af51837183795ef1b2/.command.err:0.06% overall alignment rate
./e9/1dca45b0f774af51837183795ef1b2/.command.err:0.07% overall alignment rate
./5d/92d6848531de1314472cc93fd925e6/.command.err:0.01% overall alignment rate
./5d/92d6848531de1314472cc93fd925e6/.command.err:0.01% overall alignment rate
./5d/d89652caa95d22bcab04a3967c4fe4/.command.err:0.12% overall alignment rate
./5d/d89652caa95d22bcab04a3967c4fe4/.command.err:0.11% overall alignment rate
./5d/4bd31c8cb0d93e6931106f381d2fbd/.command.err:0.00% overall alignment rate
./5d/4bd31c8cb0d93e6931106f381d2fbd/.command.err:0.00% overall alignment rate
./5d/6ffc38f297d91e4129b4928df4b7e6/.command.err:0.03% overall alignment rate
./5d/6ffc38f297d91e4129b4928df4b7e6/.command.err:0.09% overall alignment rate
./1b/e4ad316393ef546f93cd61dac684ff/.command.err:0.00% overall alignment rate

Hi Mike,

I ran into the same issue with the key error KeyError: 'TRAV29_DV5*03'. However, adding the -m assembly flag didn't help.

726715 reads; of these: 726715 (100.00%) were unpaired; of these: 725309 (99.81%) aligned 0 times 1406 (0.19%) aligned exactly 1 time 0 (0.00%) aligned >1 times 0.19% overall alignment rate 726715 reads; of these: 726715 (100.00%) were unpaired; of these: 722254 (99.39%) aligned 0 times 4461 (0.61%) aligned exactly 1 time 0 (0.00%) aligned >1 times 0.61% overall alignment rate Traceback (most recent call last): File "/home/amadrigal/tools/tracer/v0.6.0/tracer", line 21, in <module> launch() File "/home/amadrigal/tools/tracer/v0.6.0/tracerlib/launcher.py", line 43, in launch Task().run() File "/home/amadrigal/tools/tracer/v0.6.0/tracerlib/tasks.py", line 372, in run cell = self.ig_blast() File "/home/amadrigal/tools/tracer/v0.6.0/tracerlib/tasks.py", line 515, in ig_blast self.max_junc_len) File "/home/amadrigal/tools/tracer/v0.6.0/tracerlib/io.py", line 113, in parse_IgBLAST loci, max_junc_len) File "/home/amadrigal/tools/tracer/v0.6.0/tracerlib/tracer_func.py", line 213, in find_possible_alignments query_name, species, loci_for_segments) File "/home/amadrigal/tools/tracer/v0.6.0/tracerlib/tracer_func.py", line 382, in get_fasta_line_for_contig_imgt best_V_seq = IMGT_seqs[V_locus_key][segment] KeyError: 'TRAV29_DV5*03'

jfass commented

I'll second this; I'm still seeing this error with the command:

docker run --rm -v /home/ubuntu/TCRseq/downsampled:/scratch -w /scratch teichlab/tracer assemble -m assembly -p 48 -s Hsap /scratch/$fwd $rev $base ${pct}pct/${base}.TRACER

Oddly enough ... this is a down-sampling study, where I'm reducing the size of the read set to see how low we can go with future sequencing ... and I only see this error at one particular fraction of the read set, not above and below. I don't think that's relevant to the error, just interesting that there'd be this "instability" in the identified alleles.