Provide Stanford HIVdb scoring workflow
Closed this issue · 4 comments
ArtPoon commented
easiest to work off the sample consensus sequence file
ArtPoon commented
Example file works:
Elzar:micall artpoon$ python3 micall/utils/scoreHIVdb.py examples/Example_S1_L001_R1_001.conseq.csv test.csv
searching path /usr/local/lib/python3.7/site-packages/sierralocal/data/HIVDB*.xml
searching path /usr/local/lib/python3.7/site-packages/sierralocal/data/apobec*.tsv
HIVdb version 8.8
Found NucAmino binary /usr/local/lib/python3.7/site-packages/sierralocal/bin/nucamino-darwin-amd64
Aligned /var/folders/kj/tky1wj814tj_l0zdby1r3f980000gn/T/tmp0efwapz0
MAX
0.010
0.020
0.050
0.100
0.200
But I'm encountering trouble with a different data set:
Elzar:micall artpoon$ python3 micall/utils/scoreHIVdb.py examples/SRS5100454_1.conseq.csv SRS5100454.hivdb.csv
searching path /usr/local/lib/python3.7/site-packages/sierralocal/data/HIVDB*.xml
searching path /usr/local/lib/python3.7/site-packages/sierralocal/data/apobec*.tsv
HIVdb version 8.8
Found NucAmino binary /usr/local/lib/python3.7/site-packages/sierralocal/bin/nucamino-darwin-amd64
Traceback (most recent call last):
File "micall/utils/scoreHIVdb.py", line 93, in <module>
main()
File "micall/utils/scoreHIVdb.py", line 63, in main
header, scores, mutlist = next(results)
File "micall/utils/scoreHIVdb.py", line 26, in score_conseq
file_trims, subtypes = scorefile(tf.name, algorithm)
File "/usr/local/lib/python3.7/site-packages/sierralocal/main.py", line 47, in scorefile
result = aligner.align_file(input_file)
File "/usr/local/lib/python3.7/site-packages/sierralocal/nucaminohook.py", line 183, in align_file
for record in result['POL']:
KeyError: 'POL'
If I convert the conseq CSV file into FASTA, it processes fine with sierralocal
.
Elzar:examples artpoon$ sierralocal SRS5100454_1.conseq.fa
searching path /usr/local/lib/python3.7/site-packages/sierralocal/data/HIVDB*.xml
searching path /usr/local/lib/python3.7/site-packages/sierralocal/data/apobec*.tsv
HIVdb version 8.8
Found NucAmino binary /usr/local/lib/python3.7/site-packages/sierralocal/bin/nucamino-darwin-amd64
Aligned SRS5100454_1.conseq.fa
7 sequences found in file SRS5100454_1.conseq.fa.
Writing JSON to file SRS5100454_1.conseq_results.json
Time elapsed: 1.5831 seconds (4.5628 it/s)
I'm guessing this is some kind of thread concurrency issue..
Anastasiamiaomiao commented
Sorry, I didn't understand.
What is the test.csv and SRS5100454.hivdb.csv?
Is the conversion of files implemented with micall or other software?
Thanks!
ArtPoon commented
Hi @Anastasiamiaomiao, this issue is a note for myself to record a problem that I encountered while integrating sierra-local into MiCall. I haven't resolved it yet.
ArtPoon commented
Duh. Forgot to close file handle before passing stream to sierralocal