veg/hivtrace

samtools returned with error 1: stdout=, stderr=samtools sort: truncated file. Aborting\n

Closed this issue · 6 comments

HIV-TRACE is failing for me as follows:

> hivtrace -i all_sequences_uniquenames_2.fasta -a resolve -r HXB2_prrt  -t 0.015 -m 500 -g .05
[E::bam_read1] CIGAR and query sequence lengths differ for XXXX
Traceback (most recent call last):
  File "/usr/local/bin/bealign", line 207, in <module>
    args.keep_reference
  File "/usr/local/bin/bealign", line 94, in main
    BamIO.sort(output_file)
  File "/usr/local/lib/python3.7/site-packages/BioExt/io/BamIO/__init__.py", line 33, in sort
    pysam_sort("-o", tmp_path, path)
  File "/usr/local/lib/python3.7/site-packages/pysam/utils.py", line 75, in __call__
    stderr))
pysam.utils.SamtoolsError: 'samtools returned with error 1: stdout=, stderr=samtools sort: truncated file. Aborting\n'
Traceback (most recent call last):
  File "/usr/local/bin/hivtrace", line 10, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.7/site-packages/hivtrace/hivtrace.py", line 840, in main
    attributes_file=ATTRIBUTES_FILE)
  File "/usr/local/lib/python3.7/site-packages/hivtrace/hivtrace.py", line 467, in hivtrace
    subprocess.check_call(bealign_process, stdout=DEVNULL)
  File "/usr/local/Cellar/python/3.7.3/Frameworks/Python.framework/Versions/3.7/lib/python3.7/subprocess.py", line 347, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['bealign', '-q', '-r', 'HXB2_prrt', '-m', 'HIV_BETWEEN_F', '-R', 'all_sequences_uniquenames_2.fasta', '/var/folders/52/fk2tljnj5sbgnzr8p0t9777w0000gn/T/hivtrace-vgqhv1ru/all_sequences_uniquenames_2.fasta_output.bam']' returned non-zero exit status 1.

Any ideas? What I've replaced with XXXX is the first sequence in the alignment.

Dear @mdhall272,

You may have stumbled across an edge case that either causes a faulty alignment or faulty CIGAR string computation. We analyze many datasets a day using hivtrace, so this is truly a peculiar issue.

Would it be at all possible to provide a dataset with non-sensitive information that causes the issue to sweaver@temple.edu?

If not, I will see if I can simulate data to cause the same effect.

Best,
Steven

Dear @mdhall272,

I was able to replicate the issue. I'm looking into it now.

Best,
Steven

Dear @mdhall272,

This issue occurs when there is an unexpected character in the alignment to represent gaps. For example, when there is a ~ used instead of - for gaps. Can you confirm that is the case for you?

Regardless, I will keep this issue open until we place validation checks for this issue.

Best,
Steven

Just to confirm (this had slipped my mind) - it was indeed that my entry data was using a ? character. It works if I replace those with Ns.

A feature request - any chance of a check for this early? I've just accidentally tried to do this on an analysis of 30,000 sequences and it took several hours until the error occurred.

Dear @mdhall272,

Yes, that is reasonable. I will keep this issue open.

Best,
Steven.