Segmentation fault (core dumped)
BJ-Chen-Eric opened this issue · 14 comments
Hi, thanks for developing the tool. As title, when running the rattle cluster it return Segmentation fault (core dumped).
This is the code.
rattle/rattle cluster -i ~/Analysis/data/process/rna1/rna1.filter.fastq.gz -o ~/Analysis/tool/isoform_detection/rat/ --iso --rna
and the output is
RNA mode: 1
Reading fasta file... Done
Segmentation fault (core dumped)
The input file is less than 500 thousands reads and the device is 16 cores/32 threads with 1T memory. From previous discussion, the limited memory might the problem but I think my input reads has much lower amount. Hope anyone could discuss about it.
Hi, from what I see from the command you are trying to use a compressed fastq file, which RATTLE doesn't support as of now. You will need to uncompress it first.
Also, be sure to filter out small reads (we usually filter out those smaller than 150bp).
Best,
Ivan
Thanks for your rapid response. I tried it with an uncompressed and filtered file again but the result is the same. Another update is that when I adopt the split.fastq file as rattle input it can process but there is nothing in the output. Should I offer the fastq file to help figure out the problem?
Best wishes
Hi! Yes, please send me the fastq file if that's possible to ivan.delarubia@upf.edu
I have CPU 128 with 1T memory, but I still can not run a command. Could you help me?
Hi there,
Do you run into any problems when using RATTLE with the example toyset dataset? Can you please check whether your reads contain any invalid bases? RATTLE could run into this issue when generating kmers with reads containing invalid bases.
Hope this helps,
Eileen
I did not encounter any issues while using RATTLE with the example toyset dataset. I utilized fastp to filter out low-quality bases. However, both the original fastq file (20G) and the trim.fastq file (12G) faced the same problems when using RATTLE. What does 'invalid bases' refer to? Does it mean base 'N'?
Hi there,
Do you run into any problems when using RATTLE with the example toyset dataset? Can you please check whether your reads contain any invalid bases? RATTLE could run into this issue when generating kmers with reads containing invalid bases.
Hope this helps, Eileen
I encountered no issues while utilizing RATTLE with the toyset dataset as an example. To eliminate low-quality bases, I employed fastp. Nevertheless, both the initial fastq file (20G) and the trim.fastq file (12G) encountered identical problems during the application of RATTLE. Additionally, upon counting the bases in my fastq file, I did not observed the presence of the "N" base. Could you help me check this issue? :)
Hi,
Valid bases are A,T,C,G,U. All other bases in reads are considered as invalid, including 'N'. No need to worry about 'N', RATTLE will filter it out.
Could you please provide your RATTLE command? If it is possible, can you please run RATTLE with your dataset with '--verbose' flag and provide the progress bar? This could provide me more information and identify why and where RATTLE went wrong.
Thanks,
Eileen
Dear Eileen,
Thank you for your assistance! I have retried using fastp to filter the data. Fortunately, I successfully obtained the desired result when I run the same commands. However, when attempting to process the original fastq file with and applied the command you suggested, I encountered difficulties and couldn't understand the cause. It is possible that my original fastq file contains duplicate bases and low-quality bases, which could be the reason for the issues. The following figures is my command and error result.
Do you have very short or extremely long reads in your input? E.
…
On Wed, 7 Jun 2023 at 12:51, Wu Ziwei @.> wrote: Hi there, Do you run into any problems when using RATTLE with the example toyset dataset? Can you please check whether your reads contain any invalid bases? RATTLE could run into this issue when generating kmers with reads containing invalid bases. Hope this helps, Eileen I encountered no issues while utilizing RATTLE with the toyset dataset as an example. To eliminate low-quality bases, I employed fastp. Nevertheless, both the initial fastq file (20G) and the trim.fastq file (12G) encountered identical problems during the application of RATTLE. Additionally, upon counting the bases in my fastq file, I did not observed the presence of the "N" base. Could you help me check this issue? :) — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKBYQE3BREWONVKUBFRDXJ7UBTANCNFSM46HK3SQQ . You are receiving this because you are subscribed to this thread.Message ID: @.>
Yes, you might be correct. I checked my fastq file and confirmed that it contains reads longer than 150. However, I neglected to determine the length of the longest reads.
Facing this issue:==========================================
SLURM_JOB_ID = 2797830
SLURM_NODELIST = hm02
Starting at Tue May 21 18:49:46 CDT 2024
Job name: Rattle, Job ID: 2797830
I have 4 CPUs on compute node hm02
RNA mode: true
Reading fasta file...
Reads: 10527128
Done
/var/spool/slurmd/job2797830/slurm_script: line 82: 2113221 Killed ./rattle cluster -i "$filtered_file" -o "$output_folder" --rna -B 0.5 -b 0.3 -f 0.2
Reading fasta file... Done
Reading fasta file... Done
Using bigmem partition of our cluster:
$ slurminfo
QUEUE FREE TOTAL FREE TOTAL RESORC OTHER MAXJOBTIME CORES NODE GPU
PARTITION CORES CORES NODES NODES PENDING PENDING DAY-HR:MN /NODE MEM-GB (COUNT)
bigmem 48 96 0 2 0 0 7-00:00 48 1500 -
My slurm looks like this:
echo "Starting at $(date)"
echo "Job name: ${SLURM_JOB_NAME}, Job ID: ${SLURM_JOB_ID}"
echo "I have
Navigate to Porechop directory
cd /home/rkumar/Porechop || exit
Define input and output paths
input_file="/scratch/g/........./Nanopore_cDNA/A3HE/A3HE.fastq"
output_folder="/scratch/g/...../Nanopore_cDNA/A3HE/Rattle_A3HE"
Check if input file exists
if [ ! -f "$input_file" ]; then
echo "Error: Input file not found!"
exit 1
fi
Create output directory if it does not exist
mkdir -p "$output_folder/clusters"
Step 1: Filter reads by length (if needed, adjust according to your data)
filtered_file="${input_file%.fastq}_filtered.fastq"
porechop -i "$input_file" -o "$filtered_file" --discard_middle --min_split_read_size 150
Check if filtered file was created
if [ ! -f "$filtered_file" ]; then
echo "Error: Filtered file not created!"
exit 1
fi
Navigate to Rattle directory
cd /home/rkumar/RATTLE || exit
Step 2: Run the RATTLE commands
./rattle cluster -i "$filtered_file" -o "$output_folder" --rna -B 0.5 -b 0.3 -f 0.2
./rattle cluster_summary -i "$filtered_file" -c "$output_folder/clusters.out" > "$output_folder/cluster_summary.tsv"
./rattle extract_clusters -i "$filtered_file" -c "$output_folder/clusters.out" -o "$output_folder/clusters" --fastq
Step 3: Correct reads
./rattle correct -i "$filtered_file" -c "$output_folder/clusters.out" -o "$output_folder"
Step 4: Merge consensi files and run polishing step
consensi_file="$output_folder/consensi.fq"
cat "$output_folder"/*/consensi.fq > "$consensi_file"
Check if consensi file was created
if [ ! -f "$consensi_file" ]; then
echo "Error: Consensi file not created!"
exit 1
fi
./rattle polish -i "$consensi_file" -o "$output_folder" --rna
echo "Finished at $(date)"
Periodically log memory usage
while true; do
echo "Memory usage at $(date):"
free -h
sleep 600 # Log every 10 minutes
done &