EI-CoreBioinformatics/mikado

[main_samview] fail to read the header from "-".

NJeanray opened this issue · 4 comments

Hello,

I got some troubles running alignment with gsnap. Here is the error I get :

[Fri Aug  6 18:17:34 2021]
Job 6: Aligning RNAseq with gsnap (sample SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME - run 0)
Reason: Missing output files: Daijin/2-alignments/output/gsnap-SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME-0.bam; Input files updated by another job: /home/NJEANRAY/Daijin/Daijin/2-alignments/gsnap/gsnap_index.done

  gsnap --gunzip --bunzip2 --dir=Daijin/2-alignments/gsnap/index --db=Daijin  --novelsplicing=1 --localsplicedist=10000 --nthreads=40 --format=sam --npaths=20 Daijin/1-reads/SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME.R1.fq.gz Daijin/1-reads/SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME.R2.fq.gz 2> Daijin/2-alignments/logs/gsnap/gsnap-SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME-0.log | samtools view -b -@ 40 - > Daijin/2-alignments/gsnap/SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME-0/gsnap.bam && ln -sf ../gsnap/SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME-0/gsnap.bam Daijin/2-alignments/output/gsnap-SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME-0.bam && touch -h Daijin/2-alignments/output/gsnap-SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME-0.bam
[main_samview] fail to read the header from "-".
[Fri Aug  6 18:17:34 2021]
Error in rule align_gsnap:
    jobid: 6
    output: Daijin/2-alignments/gsnap/SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME-0/gsnap.bam, Daijin/2-alignments/output/gsnap-SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME-0.bam
    log: Daijin/2-alignments/logs/gsnap/gsnap-SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME-0.log (check log file(s) for error message)
    shell:
          gsnap --gunzip --bunzip2 --dir=Daijin/2-alignments/gsnap/index --db=Daijin  --novelsplicing=1 --localsplicedist=10000 --nthreads=40 --format=sam --npaths=20 Daijin/1-reads/SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME.R1.fq.gz Daijin/1-reads/SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME.R2.fq.gz 2> Daijin/2-alignments/logs/gsnap/gsnap-SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME-0.log | samtools view -b -@ 40 - > Daijin/2-alignments/gsnap/SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME-0/gsnap.bam && ln -sf ../gsnap/SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME-0/gsnap.bam Daijin/2-alignments/output/gsnap-SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME-0.bam && touch -h Daijin/2-alignments/output/gsnap-SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME-0.bam
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job align_gsnap since they might be corrupted:
Daijin/2-alignments/gsnap/SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME-0/gsnap.bam

Here are the versions of the tools used :

GSNAP version 2021-07-23 called with args: gsnap.avx2 --version
samtools 1.11
Using htslib 1.11

Could you please advice ?
Thanks in advance,

Best regards

Dear @NJeanray,

This seems like an issue with the version of samtools. Probably a previous version will work for now.

Dear @NJeanray

Would you please be able to check the error log here:

Daijin/2-alignments/logs/gsnap/gsnap-SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME-0.log

Another option would be to run the command:

gsnap --gunzip --bunzip2 --dir=Daijin/2-alignments/gsnap/index --db=Daijin --novelsplicing=1 --localsplicedist=10000 --nthreads=40 --format=sam --npaths=20 Daijin/1-reads/SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME.R1.fq.gz Daijin/1-reads/SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME.R2.fq.gz

as a standalone and check whether it actually functions? I cannot test locally at the moment (on a Mac machine and conda only has GSNAP 2021-05-27).

Things to look for: no errors and correct BAM header.

If there are errors we can take it from there.

Kind regards

Dear @lucventurini,

Please find below the content of

Daijin/2-alignments/logs/gsnap/gsnap-SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME-0.log

GSNAP version 2021-07-23 called with args: gsnap.avx2 --gunzip --bunzip2 --dir=Daijin/2-alignments/gsnap/index --db=Daijin --novelsplicing=1 --localsplicedist=10000 --nthreads=40 --format=sam --npaths=20 Daijin/1-reads/SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME.R1.fq.gz Daijin/1-reads/SNIPr_01_03_24_43_2009_9_16_VI_RP_1_S5_ME.R2.fq.gz
Novel splicing (-N) turned on => assume reads are RNA-Seq
The novelend-splicedist 80000 is greater than the localsplicedist 10000.  Resetting novelend-splicedist to be 10000
Checking compiler assumptions for SSE2: 6B8B4567 327B23C6 xor=59F066A1
Checking compiler assumptions for SSE4.1: -103 -58 max=198 => compiler zero extends
Checking compiler assumptions for SSE4.2 options: 6B8B4567 __builtin_clz=1 __builtin_ctz=0 _mm_popcnt_u32=17 __builtin_popcount=17 
Finished checking compiler assumptions
This is a large genome of more than 2^32 (4 billion) bp.
You should run gsnapl instead.

Is this helping ?

Best regards

Dear @NJeanray

Yes this is helping ... the problem is that the genome is too big for a normal GMAP build and run. The logic in Daijin unfortunately does not support providing custom command line arguments to the building ... so there is currently no direct way to solve the issue.

We would need to amend the Daijin code to allow passing through building switches to GMAP for this to function.

Our oversight, we generally do not use Daijin with GSNAP or on such big genomes, so this bug slipped through.

CC @ljyanesm

Kind regards

Luca Venturini