adamewing/bamsurgeon

Samtools view issue

Opened this issue · 0 comments

Hi, I am trying to insert ~120 mutations to my BAM file using BAMSurgeon addSNV.py script, but I keep hitting the error:

2022-08-25T11:00:28.235353426Z [main_samview] fail to read the header from "addsnv.tmp/haplo_EB0001_1008267_1008267.tmpbam.a72f8dcd-f432-4b57-abfe-b2055ab518fe.bam.realign.sam".
2022-08-25T11:00:28.239915749Z INFO 2022-08-25 11:00:28,239 haplo_EB0001_2724975_2724975 creating tmp bam: addsnv.tmp/haplo_EB0001_2724975_2724975.tmpbam.b95d3939-19c5-464b-a6aa-6eee951a0d9a.bam
2022-08-25T11:00:28.365411158Z [Thu Aug 25 11:00:28 UTC 2022] picard.sam.SamToFastq done. Elapsed time: 0.01 minutes.
2022-08-25T11:00:28.365631285Z Runtime.totalMemory()=517996544
2022-08-25T11:00:28.408366877Z 11:00:28.406 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/picard.jar!/com/intel/gkl/native/libgkl_compression.so
2022-08-25T11:00:28.427371990Z INFO 2022-08-25 11:00:28,427 haplo_EB0001_220985_220985 aligning addsnv.tmp/haplo_EB0001_220985_220985.tmpbam.d7fd2714-6138-47d4-8a0e-f4527625c8ec.fastq with bwa mem
2022-08-25T11:00:28.436464771Z [E::bwa_idx_load_from_disk] fail to locate the index files
2022-08-25T11:00:28.436903841Z INFO 2022-08-25 11:00:28,436 haplo_EB0001_220985_220985 writing addsnv.tmp/haplo_EB0001_220985_220985.tmpbam.d7fd2714-6138-47d4-8a0e-f4527625c8ec.bam.realign.sam to BAM...
2022-08-25T11:00:28.441893491Z INFO	2022-08-25 11:00:28	SamToFastq	
2022-08-25T11:00:28.441916700Z 
2022-08-25T11:00:28.441921046Z ********** NOTE: Picard's command line syntax is changing.
2022-08-25T11:00:28.441924895Z **********
2022-08-25T11:00:28.441928440Z ********** For more information, please see:
2022-08-25T11:00:28.441932050Z ********** 
2022-08-25T11:00:28.441935848Z https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
2022-08-25T11:00:28.441939934Z **********
2022-08-25T11:00:28.441943502Z ********** The command line looks like this in the new syntax:
2022-08-25T11:00:28.441947100Z **********
2022-08-25T11:00:28.441951172Z **********    SamToFastq -VALIDATION_STRINGENCY SILENT -INPUT addsnv.tmp/haplo_EB0001_1469895_1469895.tmpbam.843edc09-803a-4f9e-87f2-8003be7fcf34.bam -INCLUDE_NON_PRIMARY_ALIGNMENTS false -FASTQ addsnv.tmp/haplo_EB0001_1469895_1469895.tmpbam.843edc09-803a-4f9e-87f2-8003be7fcf34.fastq -INTERLEAVE true
2022-08-25T11:00:28.441956962Z **********
2022-08-25T11:00:28.441960501Z 
2022-08-25T11:00:28.441963877Z 
2022-08-25T11:00:28.463290448Z [main_samview] fail to read the header from "addsnv.tmp/haplo_EB0001_220985_220985.tmpbam.d7fd2714-6138-47d4-8a0e-f4527625c8ec.bam.realign.sam".
2022-08-25T11:00:28.465594101Z [Thu Aug 25 11:00:28 UTC 2022] SamToFastq INPUT=addsnv.tmp/haplo_EB0001_1666736_1666736.tmpbam.e2df6f83-9763-49c7-a1b6-71f2cf77e6b4.bam FASTQ=addsnv.tmp/haplo_EB0001_1666736_1666736.tmpbam.e2df6f83-9763-49c7-a1b6-71f2cf77e6b4.fastq INTERLEAVE=true INCLUDE_NON_PRIMARY_ALIGNMENTS=false VALIDATION_STRINGENCY=SILENT    OUTPUT_PER_RG=false COMPRESS_OUTPUTS_PER_RG=false RG_TAG=PU RE_REVERSE=true INCLUDE_NON_PF_READS=false CLIPPING_MIN_LENGTH=0 READ1_TRIM=0 READ2_TRIM=0 VERBOSITY=INFO QUIET=false COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
2022-08-25T11:00:28.467823955Z concurrent.futures.process._RemoteTraceback: 
2022-08-25T11:00:28.467844140Z """
2022-08-25T11:00:28.467848901Z Traceback (most recent call last):
2022-08-25T11:00:28.467852780Z   File "/usr/local/lib/python3.9/concurrent/futures/process.py", line 243, in _process_worker
2022-08-25T11:00:28.467856982Z     r = call_item.fn(*call_item.args, **call_item.kwargs)
2022-08-25T11:00:28.467860852Z   File "/opt/bamsurgeon/bin/addsnv.py", line 243, in makemut
2022-08-25T11:00:28.467865608Z     aligners.remap_bam(args.aligner, tmpoutbamname, args.refFasta, alignopts, threads=int(args.alignerthreads), mutid=hapstr, paired=(not args.single), picardjar=args.picardjar, insane=args.insane)
2022-08-25T11:00:28.467869938Z   File "/opt/bamsurgeon/bin/bamsurgeon/aligners.py", line 76, in remap_bam
2022-08-25T11:00:28.467873864Z     remap_bwamem_bam(bamfn, threads, fastaref, picardjar, mutid=mutid, paired=paired, insane=insane)
2022-08-25T11:00:28.467877605Z   File "/opt/bamsurgeon/bin/bamsurgeon/aligners.py", line 206, in remap_bwamem_bam
2022-08-25T11:00:28.467881602Z     subprocess.check_call(bam_cmd)
2022-08-25T11:00:28.467885110Z   File "/usr/local/lib/python3.9/subprocess.py", line 373, in check_call
2022-08-25T11:00:28.467889006Z     raise CalledProcessError(retcode, cmd)
2022-08-25T11:00:28.467892697Z subprocess.CalledProcessError: Command '['samtools', 'view', '-bt', 'EB0001_Annotated_20211221_2.ABS27526.fasta.fai', '-o', 'addsnv.tmp/haplo_EB0001_220985_220985.tmpbam.d7fd2714-6138-47d4-8a0e-f4527625c8ec.bam', 'addsnv.tmp/haplo_EB0001_220985_220985.tmpbam.d7fd2714-6138-47d4-8a0e-f4527625c8ec.bam.realign.sam']' returned non-zero exit status 1.
2022-08-25T11:00:28.467897784Z """
2022-08-25T11:00:28.467901347Z 
2022-08-25T11:00:28.467904935Z The above exception was the direct cause of the following exception:
2022-08-25T11:00:28.467908590Z 
2022-08-25T11:00:28.467911627Z Traceback (most recent call last):
2022-08-25T11:00:28.467915253Z   File "/opt/bamsurgeon/bin/addsnv.py", line 483, in <module>
2022-08-25T11:00:28.467981591Z [Thu Aug 25 11:00:28 UTC 2022] Executing as root@f962ac25120b on Linux 5.4.0-1071-aws amd64; OpenJDK 64-Bit Server VM 11.0.16+8-post-Ubuntu-0ubuntu120.04; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.27.4-SNAPSHOTINFO 2022-08-25 11:00:28,467 haplo_EB0001_2766544_2766544 creating tmp bam: addsnv.tmp/haplo_EB0001_2766544_2766544.tmpbam.c1d49a77-14c6-4e2a-8c80-78c91bb35fef.bam
2022-08-25T11:00:28.467988075Z 
2022-08-25T11:00:28.468102237Z     run()
2022-08-25T11:00:28.468278858Z   File "/opt/bamsurgeon/bin/addsnv.py", line 480, in run
2022-08-25T11:00:28.468526462Z     main(args)
2022-08-25T11:00:28.468536827Z   File "/opt/bamsurgeon/bin/addsnv.py", line 390, in main
2022-08-25T11:00:28.468737497Z     tmpbamlist = result.result()
2022-08-25T11:00:28.468754476Z   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 440, in result
2022-08-25T11:00:28.468983656Z     return self.__get_result()
2022-08-25T11:00:28.468994882Z   File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 389, in __get_result
2022-08-25T11:00:28.469193753Z     raise self._exception
2022-08-25T11:00:28.469280963Z subprocess.CalledProcessError: Command '['samtools', 'view', '-bt', 'EB0001_Annotated_20211221_2.ABS27526.fasta.fai', '-o', 'addsnv.tmp/haplo_EB0001_220985_220985.tmpbam.d7fd2714-6138-47d4-8a0e-f4527625c8ec.bam', 'addsnv.tmp/haplo_EB0001_220985_220985.tmpbam.d7fd2714-6138-47d4-8a0e-f4527625c8ec.bam.realign.sam']' returned non-zero exit status 1.

I am using the latest BAMsurgeon: 1.4.1.
Dependency versions:

  • samtools 1.15.1
  • Picard 2.27.4

This is command line:
python3.9 /opt/bamsurgeon/bin/addsnv.py --picardjar /opt/picard.jar --aligner mem -o Ferm619-2.simulated.bam --alignerthreads 32 -r EB0001_Annotated_20211221_2.ABS27526.fasta -v _1_EB001_simulated_mutations_with_alt.tsv -p 32 -f Ferm619-2.bam

I checked also for coverage on these positions, and there are coverage of 122X on problematic position. Can someone point me in the direction, what to debug, what to try?