rcFastq

Question

rcFastq

Closed this issue 10 months ago · 15 comments

hello, i run you docker,
apptainer run -B /workplace bidseq_latest.sif

i produce error: /pipeline/bin/rcFastq: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /pipeline/bin/rcFastq)

can you update this glibc?

Answer 1 · 2023-11-07T07:37:39.000Z

Hi, today i find this parameter forward_stranded: false may be right for me. so i set it in the data.yaml. And i also set speedy_mapping: true, but I met this problem. i can't get it! was i set parameter in wrong site?

i aslo don't understand if i should set the adapter of my, and how set ?

[Tue Nov 7 14:35:08 2023]
rule reverse_reads:
input: .tmp/trimmed_reads/WT-Testis-2-IP_run1_cut.fq.gz
output: .tmp/reversed_reads/WT-Testis-2-IP_run1.fq.gz
jobid: 30
reason: Missing output files: .tmp/reversed_reads/WT-Testis-2-IP_run1.fq.gz; Input files updated by another job: .tmp/trimmed_reads/WT-Testis-2-IP_run1_cut.fq.gz
wildcards: sample=WT-Testis-2-IP, rn=run1
resources: tmpdir=/tmp

Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 20
Rules claiming more threads will be scaled down.
Select jobs to execute...
[Tue Nov 7 14:35:08 2023]
Finished job 30.
9 of 111 steps (8%) done
Removing temporary output .tmp/trimmed_reads/WT-Testis-2-IP_run1_cut.fq.gz.
Select jobs to execute...
WorkflowError:
File .tmp/reversed_reads/WT-Testis-2-IP_run1.fq.gz seems to be a broken symlink.

Answer 2 · 2023-11-07T07:47:29.000Z

Hi @hanguojun007, you ran the code correctly. It is an internal error of the pipeline. The temporary file is being removed too quickly before the next step starts. I have not thoroughly tested reversed libraries. Thank you for pointing this out. I have just fixed it in the latest version. Could you please run the apptainer command again? You do not need to remove the previous output, as the pipeline will restart from this step automatically.

Answer 3 · 2023-11-07T08:24:37.000Z

Thanks， it run !
i have a quenstion: what dose reason: Missing output files mean? and my terminal show the this process is bowtie2-align-s, is that mean the mapping rRNA is true? but i set speedy_mapping: true!

rule map_to_genes_by_bowtie2:
input: .tmp/reversed_reads/mESCWT-rep1-input_run1.fq.gz, internal_files/mapping_index/genes.1.bt2
output: .tmp/mapping_unsort/mESCWT-rep1-input_run1_genes.bam, .tmp/mapping_unsort/mESCWT-rep1-input_run1_genes.fq, report_reads/mapping/mESCWT-rep1-input_run1_genes.report
jobid: 10
reason: Missing output files: report_reads/mapping/mESCWT-rep1-input_run1_genes.report, .tmp/mapping_unsort/mESCWT-rep1-input_run1_genes.bam, .tmp/mapping_unsort/mESCWT-rep1-input_run1_genes.fq; Input files updated by another job: .tmp/reversed_reads/mESCWT-rep1-input_run1.fq.gz
wildcards: sample=mESCWT-rep1-input, rn=run1
threads: 20
resources: tmpdir=/tmp

Answer 4 · 2023-11-07T08:34:07.000Z

Yes. "Missing output file" is not an error. It checks if the output of each step of the pipeline exists and rerun these steps if the output file does not exist.

Answer 5 · 2023-11-09T07:36:58.000Z

Hi, i have continued to run this pipeline for two days, but it stiil in the rule map_to_genes_by_bowtie2 of WT-Testis-2-IP_run1.fq.gz, and only get such little file. Could you tell me why and how to deal it?

Thanks !!

[Tue Nov 7 23:59:45 2023]
rule map_to_genes_by_bowtie2:
input: .tmp/reversed_reads/WT-Testis-2-IP_run1.fq.gz, internal_files/mapping_index/genes.1.bt2
output: .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.bam, .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq, report_reads/mapping/WT-Testis-2-IP_run1_genes.report
jobid: 29
reason: Missing output files: report_reads/mapping/WT-Testis-2-IP_run1_genes.report, .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.bam, .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq; Input files updated by another job: internal_files/mapping_index/genes.1.bt2, .tmp/reversed_reads/WT-Testis-2-IP_run1.fq.gz
wildcards: sample=WT-Testis-2-IP, rn=run1
threads: 20
resources: tmpdir=/tmp

Answer 6 · 2023-11-09T07:43:22.000Z

Could you attched the reference fasta you used for the genes mapping? This step is for masking ribosomal reads, and it should not take such a long time.

Answer 7 · 2023-11-09T07:49:51.000Z

Mus_musculus.GRCm39.ncrna.zip

Answer 8 · 2023-11-09T07:54:32.000Z

sorry， it's not rRNA fasta！！！i was wrong.

Answer 9 · 2023-11-09T08:00:00.000Z

Thank you for the information. Yes, this step is for masking rRNA or tRNA reads only. Mapping the whole transcriptome using this setting would take an extremely long time.

Answer 10 · 2023-11-09T08:30:22.000Z

i have a bug with perl. my device have perl=5.16, and i run apptainer in mamba env, but the sif container show perl: symbol lookup error: /root/perl5/lib/perl5/x86_64-linux-thread-multi/auto/Cwd/Cwd.so: undefined symbol: Perl_xs_version_bootcheck
when i run bowtie2.
Thanks again !!!

Answer 11 · 2023-11-09T19:52:36.000Z

Hi @hanguojun007. The docker env won't be affected by the perl on your host machine. I am not sure if apptainer app or the pipeline triggered this error. Could you send me the full log for debugging?

Answer 12 · 2023-11-10T01:00:03.000Z

i run apptainer pull docker://y9ch/bidseq to get bidseq_latest.sif.
i run apptainer run -B /workplace bidseq_latest.sif
when run to bowtie2, i call
[Thu Nov 9 17:11:02 2023]
rule map_to_genes_by_bowtie2:
input: .tmp/reversed_reads/WT-Testis-2-IP_run1.fq.gz, internal_files/mapping_index/genes.1.bt2
output: .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.bam, .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq, report_reads/mapping/WT-Testis-2-IP_run1_genes.report
jobid: 29
reason: Missing output files: .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq, .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.bam, report_reads/mapping/WT-Testis-2-IP_run1_genes.report; Input files updated by another job: internal_files/mapping_index/genes.1.bt2, .tmp/reversed_reads/WT-Testis-2-IP_run1.fq.gz
wildcards: sample=WT-Testis-2-IP, rn=run1
threads: 20
resources: tmpdir=/tmp

Error: BamOpen { target: "-" }
[main_samview] fail to read the header from "-".
[Thu Nov 9 17:11:02 2023]
Error in rule map_to_genes_by_bowtie2:
jobid: 29
output: .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.bam, .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq, report_reads/mapping/WT-Testis-2-IP_run1_genes.report
shell:

    export LC_ALL=C
    /pipeline/micromamba/bin/bowtie2 -p 20             --end-to-end --ma 0 --score-min L,4,-0.5 -D 20 -R 3 -L 8 -N 1 -i S,1,0.5 --mp 6,3 --rdg 1,2 --rfg 6,3 --norc -a             --no-unal --un .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq -x internal_files/mapping_index/genes -U .tmp/reversed_reads/WT-Testis-2-IP_run1.fq.gz 2>report_reads/mapping/WT-Testis-2-IP_run1_genes.report |             /pipeline/bin/samFilter |             /pipeline/micromamba/bin/samtools view -O BAM -o .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.bam
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job map_to_genes_by_bowtie2 since they might be corrupted:
report_reads/mapping/WT-Testis-2-IP_run1_genes.report
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-11-09T163310.814984.snakemake.log

question again, if star version should >= 2.7.10. because my star_index was build by star 2.5.1. so it call
[Thu Nov 9 18:48:17 2023]
rule map_to_genome_by_star:
input: .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq, .tmp/mapping_rerun/WT-Testis-2-IP_run1_genes.fq
output: .tmp/mapping_unsort/WT-Testis-2-IP_run1_genome.bam, discarded_reads/WT-Testis-2-IP_run1_unmapped.fq.gz, report_reads/mapping/WT-Testis-2-IP_run1_genome.report, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.out, .tmp/star_mapping/WT-Testis-2-IP_run1_SJ.out.tab, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.progress.out, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.std.out
jobid: 31
reason: Missing output files: report_reads/mapping/WT-Testis-2-IP_run1_genome.report, .tmp/mapping_unsort/WT-Testis-2-IP_run1_genome.bam; Input files updated by another job: .tmp/mapping_rerun/WT-Testis-2-IP_run1_genes.fq, .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq
wildcards: sample=WT-Testis-2-IP, rn=run1
threads: 20
resources: tmpdir=/tmp

EXITING because of FATAL ERROR: Genome version: 20201 is INCOMPATIBLE with running STAR version: 2.7.10b
SOLUTION: please re-generate genome from scratch with running version of STAR, or with version: 2.7.4a

Nov 09 18:48:18 ...... FATAL ERROR, exiting
[Thu Nov 9 18:48:18 2023]
Error in rule map_to_genome_by_star:
jobid: 31
output: .tmp/mapping_unsort/WT-Testis-2-IP_run1_genome.bam, discarded_reads/WT-Testis-2-IP_run1_unmapped.fq.gz, report_reads/mapping/WT-Testis-2-IP_run1_genome.report, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.out, .tmp/star_mapping/WT-Testis-2-IP_run1_SJ.out.tab, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.progress.out, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.std.out
shell:

    ulimit -n 20000
    rm -f .tmp/star_mapping/WT-Testis-2-IP_run1_Unmapped.out.mate1
    mkfifo .tmp/star_mapping/WT-Testis-2-IP_run1_Unmapped.out.mate1
    cat .tmp/star_mapping/WT-Testis-2-IP_run1_Unmapped.out.mate1 | gzip > discarded_reads/WT-Testis-2-IP_run1_unmapped.fq.gz &
    /pipeline/micromamba/bin/STAR           --runThreadN 20           --genomeDir /workplace/database/mouse/ucsc_mm39/STAR_index           --readFilesIn .tmp/mapping_unsort/WT-Testis-2-IP_run1_genes.fq,.tmp/mapping_rerun/WT-Testis-2-IP_run1_genes.fq           --alignEndsType Local           --scoreDelOpen -1           --scoreDelBase -1           --scoreInsOpen -2           --scoreInsBase -2           --outFilterMatchNmin 15           --outFilterMatchNminOverLread 0.8           --outFilterMismatchNmax 10           --outFilterMismatchNoverLmax 0.2           --outFilterIntronMotifs RemoveNoncanonicalUnannotated           --alignSJDBoverhangMin 1           --alignSJoverhangMin 5           --chimSegmentMin 20           --chimOutType WithinBAM HardClip           --chimJunctionOverhangMin 15           --chimScoreJunctionNonGTAG 0           --outFilterMultimapNmax 10           --outFilterMultimapScoreRange 0           --outSAMmultNmax -1           --outMultimapperOrder Random           --outReadsUnmapped Fastx           --outSAMtype BAM Unsorted           --outStd BAM_Unsorted           --outSAMattrRGline ID:WT-Testis-2-IP SM:WT-Testis-2-IP LB:RNA PL:Illumina PU:SE           --outSAMattributes NH HI AS nM NM MD jM jI MC ch           --outFileNamePrefix .tmp/star_mapping/WT-Testis-2-IP_run1_ > .tmp/mapping_unsort/WT-Testis-2-IP_run1_genome.bam
    mv .tmp/star_mapping/WT-Testis-2-IP_run1_Log.final.out report_reads/mapping/WT-Testis-2-IP_run1_genome.report
    rm .tmp/star_mapping/WT-Testis-2-IP_run1_Unmapped.out.mate1
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job map_to_genome_by_star since they might be corrupted:
.tmp/mapping_unsort/WT-Testis-2-IP_run1_genome.bam, discarded_reads/WT-Testis-2-IP_run1_unmapped.fq.gz, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.out, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.progress.out, .tmp/star_mapping/WT-Testis-2-IP_run1_Log.std.out
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-11-09T171328.164488.snakemake.log

Answer 13 · 2023-11-10T05:24:28.000Z

the pipepline have stay in this step for hours, and the top don't show any info about pipline. But there is no error.

[Fri Nov 10 11:06:58 2023]
rule reverse_reads:
input: .tmp/trimmed_reads/WT-Testis-2-Input_run1_cut.fq.gz
output: .tmp/reversed_reads/WT-Testis-2-Input_run1.fq.gz
jobid: 18
reason: Missing output files: .tmp/reversed_reads/WT-Testis-2-Input_run1.fq.gz; Input files updated by another job: .tmp/trimmed_reads/WT-Testis-2-Input_run1_cut.fq.gz
wildcards: sample=WT-Testis-2-Input, rn=run1
resources: tmpdir=/tmp

[Fri Nov 10 11:06:58 2023]
rule gap_realign:
input: .tmp/mapping_unsort/WT-Testis-1-IP_run1_genes.bam
output: .tmp/mapping_realigned_unsorted/WT-Testis-1-IP_run1_genes.cram
jobid: 28
reason: Missing output files: .tmp/mapping_realigned_unsorted/WT-Testis-1-IP_run1_genes.cram; Input files updated by another job: .tmp/mapping_unsort/WT-Testis-1-IP_run1_genes.bam
wildcards: sample=WT-Testis-1-IP, rn=run1, reftype=genes
resources: tmpdir=/tmp

Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 20
Rules claiming more threads will be scaled down.
Select jobs to execute...
[Fri Nov 10 11:07:00 2023]
Finished job 18.
27 of 90 steps (30%) done
Removing temporary output .tmp/trimmed_reads/WT-Testis-2-Input_run1_cut.fq.gz.
Select jobs to execute...

Answer 14 · 2023-11-11T06:54:02.000Z

Hi @hanguojun007! Thank you for providing the debugging information. To build the STAR index correctly, make sure you're using the latest version of STAR. In the future, I plan to have the pipeline generate the index automatically. For now, you'll need to update the STAR index version on your end.
When it comes to bowtie2 errors, they can be quite complex since they don't provide clear error logs. However, based on my experiments, most bowtie2 errors are caused by running out of memory.

It appears that the gap realigner step is taking longer than anticipated. If this step is taking too much time, it suggests that there are numerous reads with gaps in your dataset. However, I find it unlikely that the psu level is that high. I suspect that the adapter sequence isn't completely trimmed, which could result in artifacts in the alignment. To confirm this, you can check the bam file.

Answer 15 · 2023-11-11T07:04:06.000Z

By the way, could you give more information about your library preparation method? Also, would you mind posting different bugs as new issues? This would be helpful for other users in finding the useful information they need.