PoonLab/MiCall-Lite

Provide Stanford HIVdb scoring workflow

Closed this issue · 4 comments

easiest to work off the sample consensus sequence file

Example file works:

Elzar:micall artpoon$ python3 micall/utils/scoreHIVdb.py examples/Example_S1_L001_R1_001.conseq.csv test.csv
searching path /usr/local/lib/python3.7/site-packages/sierralocal/data/HIVDB*.xml
searching path /usr/local/lib/python3.7/site-packages/sierralocal/data/apobec*.tsv
HIVdb version 8.8
Found NucAmino binary /usr/local/lib/python3.7/site-packages/sierralocal/bin/nucamino-darwin-amd64
Aligned /var/folders/kj/tky1wj814tj_l0zdby1r3f980000gn/T/tmp0efwapz0
MAX
0.010
0.020
0.050
0.100
0.200

But I'm encountering trouble with a different data set:

Elzar:micall artpoon$ python3 micall/utils/scoreHIVdb.py examples/SRS5100454_1.conseq.csv SRS5100454.hivdb.csv 
searching path /usr/local/lib/python3.7/site-packages/sierralocal/data/HIVDB*.xml
searching path /usr/local/lib/python3.7/site-packages/sierralocal/data/apobec*.tsv
HIVdb version 8.8
Found NucAmino binary /usr/local/lib/python3.7/site-packages/sierralocal/bin/nucamino-darwin-amd64
Traceback (most recent call last):
  File "micall/utils/scoreHIVdb.py", line 93, in <module>
    main()
  File "micall/utils/scoreHIVdb.py", line 63, in main
    header, scores, mutlist = next(results)
  File "micall/utils/scoreHIVdb.py", line 26, in score_conseq
    file_trims, subtypes = scorefile(tf.name, algorithm)
  File "/usr/local/lib/python3.7/site-packages/sierralocal/main.py", line 47, in scorefile
    result = aligner.align_file(input_file)
  File "/usr/local/lib/python3.7/site-packages/sierralocal/nucaminohook.py", line 183, in align_file
    for record in result['POL']:
KeyError: 'POL'

If I convert the conseq CSV file into FASTA, it processes fine with sierralocal.

Elzar:examples artpoon$ sierralocal SRS5100454_1.conseq.fa
searching path /usr/local/lib/python3.7/site-packages/sierralocal/data/HIVDB*.xml
searching path /usr/local/lib/python3.7/site-packages/sierralocal/data/apobec*.tsv
HIVdb version 8.8
Found NucAmino binary /usr/local/lib/python3.7/site-packages/sierralocal/bin/nucamino-darwin-amd64
Aligned SRS5100454_1.conseq.fa
7 sequences found in file SRS5100454_1.conseq.fa.
Writing JSON to file SRS5100454_1.conseq_results.json
Time elapsed: 1.5831 seconds (4.5628 it/s)

I'm guessing this is some kind of thread concurrency issue..

Sorry, I didn't understand.
What is the test.csv and SRS5100454.hivdb.csv?
Is the conversion of files implemented with micall or other software?
Thanks!

Hi @Anastasiamiaomiao, this issue is a note for myself to record a problem that I encountered while integrating sierra-local into MiCall. I haven't resolved it yet.

Duh. Forgot to close file handle before passing stream to sierralocal