samtools returned with error 1: stdout=, stderr=samtools sort: truncated file. Aborting\n
Closed this issue · 6 comments
HIV-TRACE is failing for me as follows:
> hivtrace -i all_sequences_uniquenames_2.fasta -a resolve -r HXB2_prrt -t 0.015 -m 500 -g .05
[E::bam_read1] CIGAR and query sequence lengths differ for XXXX
Traceback (most recent call last):
File "/usr/local/bin/bealign", line 207, in <module>
args.keep_reference
File "/usr/local/bin/bealign", line 94, in main
BamIO.sort(output_file)
File "/usr/local/lib/python3.7/site-packages/BioExt/io/BamIO/__init__.py", line 33, in sort
pysam_sort("-o", tmp_path, path)
File "/usr/local/lib/python3.7/site-packages/pysam/utils.py", line 75, in __call__
stderr))
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools sort: truncated file. Aborting\n'
Traceback (most recent call last):
File "/usr/local/bin/hivtrace", line 10, in <module>
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/hivtrace/hivtrace.py", line 840, in main
attributes_file=ATTRIBUTES_FILE)
File "/usr/local/lib/python3.7/site-packages/hivtrace/hivtrace.py", line 467, in hivtrace
subprocess.check_call(bealign_process, stdout=DEVNULL)
File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/subprocess.py", line 347, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['bealign', '-q', '-r', 'HXB2_prrt', '-m', 'HIV_BETWEEN_F', '-R', 'all_sequences_uniquenames_2.fasta', '/var/folders/52/fk2tljnj5sbgnzr8p0t9777w0000gn/T/hivtrace-vgqhv1ru/all_sequences_uniquenames_2.fasta_output.bam']' returned non-zero exit status 1.
Any ideas? What I've replaced with XXXX
is the first sequence in the alignment.
Dear @mdhall272,
You may have stumbled across an edge case that either causes a faulty alignment or faulty CIGAR string computation. We analyze many datasets a day using hivtrace, so this is truly a peculiar issue.
Would it be at all possible to provide a dataset with non-sensitive information that causes the issue to sweaver@temple.edu?
If not, I will see if I can simulate data to cause the same effect.
Best,
Steven
Dear @mdhall272,
This issue occurs when there is an unexpected character in the alignment to represent gaps. For example, when there is a ~
used instead of -
for gaps. Can you confirm that is the case for you?
Regardless, I will keep this issue open until we place validation checks for this issue.
Best,
Steven
Just to confirm (this had slipped my mind) - it was indeed that my entry data was using a ? character. It works if I replace those with Ns.
A feature request - any chance of a check for this early? I've just accidentally tried to do this on an analysis of 30,000 sequences and it took several hours until the error occurred.