comprna/RATTLE

rattle correction step giving error

dvirdi01 opened this issue · 9 comments

I ran rattle correct on my input files through snakemake. I get an error message saying this:

Error in rule cluster_correction:
jobid: 13
input: data/.../.../samplefile.fastq
output: data/RATTLE_out/samplefile/corrected.fq, data/RATTLE_out/samplefile/uncorrected.fq, data/RATTLE_out/samplefile/consensi.fq
log: log/RATTLE_log/samplefile_correct.out, log/RATTLE_log/samplefile_correct.err (check log file(s) for error details)
shell:
/storage/.../.../bin/RATTLE/rattle correct -i data/.../.../samplefile.fastq -c data/RATTLE_out/samplefile/clusters.out -o data/RATTLE_out/samplefile/corrected.fq data/RATTLE_out/samplefile/uncorrected.fq data/RATTLE_out/samplefile/consensi.fq -t 48 > log/RATTLE_log/samplefile.out 2> log/RATTLE_log/samplefile.err
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Error executing rule cluster_correction on cluster (jobid: 13, external: 2761217, jobscript: /storage/.../.../.../.snakemake/tmp.tz0fhacf/snakejob.cluster_correction.13.sh). For error details see the cluster log and the log files of the involved rule(s).

When I open samplefile.err it says: "Reading fasta file... Done" and when I open samplefile.out it is empty.

I also get this message below:

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=2761217.0 cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.
srun: error: valiant1: task 0: Out Of Memory
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=2761217.batch cgroup. Some of your processes may have been killed by the cgroup out-of-memory handler.

I gave it 100GB ram to begin with but I guess it wasn't enough. Is there a way to know how much ram I need to give it before I run the snakemake command?

Hi,

You can have a look at the memory usage figure in our paper.

Otherwise, I need more information like the number of reads or the fastq file size to give you a RAM estimation.

Your 'samplefile.out' file should not be empty, because it is a binary file. You need to look at the file size to check whether it is empty.

Eileen

Hi, I checked the output files for some of the processes that did run. IIt created consensi.fq, uncorrected,fq and corrected.fq hbut they are all 0 bytes. I am not sure why this is happening. This was my snakemake command:

rule cluster_correction:
input: "data/.../.../{sample}.fastq"
output:
touch("data/.../{sample}/corrected.fq"),
touch("data/.../{sample}/uncorrected.fq"),
touch("data/.../{sample}/consensi.fq")
params:
clusters = "data/.../{sample}/clusters.out"
log:
out = "log/.../{sample}_correct.out",
err = "log/.../{sample}_correct.err"
threads:
48
resources:
mem = 100
shell:
"""/storage/.../.../.../.../rattle correct
-i {input}
-c {params.clusters}
-o {output}
-t {threads}
> {log.out}
2> {log.err}
"""

To add on: the same happened with my rattle cluster_summary step- it created a tsv file but it was also 0 bytes.

Hi,

This problem seems not from the error correction step but from the clustering step.

Please provide answers to the following questions to help us identify the issues and provide solutions.

  1. Is your clustering step output (clusters.out) file size 0 bytes?
  2. What is your clustering step command? And what is the log for your clustering step?
  3. Do you meet the out-of-memory issue with your clustering step? Normally, clustering uses more memory than error correction.

Eileen

  1. none of my clusters.out files are 0 bytes so I think cluster and cluster extraction steps were working
  2. this was my rule for clustering step:

input: "data/.../..../{samle}.fastq.gz"
output:
touch("data/.../{sample}.done")
params:
outdir = "data/..../{sample}"
log:
out = "log/.../{sample}.out",
err = "log/.../{sample}.err"
threads:
48
resources:
mem = 200
shell:
"""mkdir -p {params.outdir};
/storage/.../.../.../.../rattle cluster
--input {input}
--output {params.outdir}
--threads {threads}
--verbose
> {log.out}
2> {log.err}"""

In my log, my sample.out file says "Reads: ...some number..." and my sample.err says: [================================================================================] 67715/67715 (100%)85%)
Iteration 0.3 complete
[================================================================================] 24054/24054 (100%)58%)
Iteration 0.25 complete
[================================================================================] 11360/11360 (100%)12%)
Iteration 0.2 complete
[================================================================================] 7204/7204 (100%)61%)
Iteration 0 complete
Gene clustering done
5507 gene clusters found

  1. I think I did for some of the files. For those I re-ran it by allocating more memory.

Hi,

Your RATTLE error correction step command is incorrect. To specify the outputs, you don't need to list all the output files' names and locations. Only need an output folder location, like -o [out_dir]

Hope this helps.
Eileen

  1. Hi, isn't that what I did though? I gave the output file location as params.outdir?

Edit: Oh I think I get what you were saying
I had this earlier for my error correction step in my smk file:

output:
touch("data/.../{sample}/corrected.fq"),
touch("data/.../{sample}/uncorrected.fq"),
touch("data/.../{sample}/consensi.fq")

but I should change it to-

output:
touch("data/.../{sample}")

Is this ^ what you meant? Also In my snakefile I had:

rule all:
    input:
       expand("data/..../{sample}/{filename}.fq",  
       sample = config['samples'], filename = ["corrected", "uncorrected", "consensi"])

Would I need to change the expand command in my snakefile?


  1. Also, how about my cluster_summary.tsv file being empty? Was it due to the same error? I did not run the cluster extraction and cluster summary step from snakemake but I ran it directly from command line for all my files. This is what I had:
./rattle extract_clusters -i /storage/.../.../.../.../.../.../sample.fastq  -c /storage/.../.../.../.../.../sample/clusters.out -o /storage/.../..../.../.../.../sample/clusters --fastq

./rattle cluster_summary -i /storage/.../.../.../.../.../.../sample.fastq -c /storage/.../.../.../.../.../sample/clusters.out > /storage/.../.../.../.../.../sample/cluster_summary.tsv

Why did this command produce an empty tsv file?

  1. Your new output command is correct.
    If you want to use multiple fastq files as input, the format should be -i input_1.fq,input_2.fq,...,input_n.fq. All files must be separated by comma, no space or line break is allowed. Don't use Snakenmake expand for RATTLE input, expand will create new lines.
    Also, I don't understand why using corrected.fq, uncorrected.fq, consensi.fq as input. This will make your input and output the exact same file.

  2. Your command looks correct.
    Possible issues:
    Inputs of the cluster step and cluster_summary step are not the same.
    Your input.fastq file or clusters.out file location is incorrect.

extract_clusters and cluster_summary are designed to make cluster step results readable. Only the cluster step is necessary step before the correction step.