comprna/RATTLE

Segmentation fault (core dumped)

BJ-Chen-Eric opened this issue · 14 comments

Hi, thanks for developing the tool. As title, when running the rattle cluster it return Segmentation fault (core dumped).
This is the code.

rattle/rattle cluster -i ~/Analysis/data/process/rna1/rna1.filter.fastq.gz -o ~/Analysis/tool/isoform_detection/rat/ --iso --rna

and the output is

RNA mode: 1
Reading fasta file... Done
Segmentation fault (core dumped)

The input file is less than 500 thousands reads and the device is 16 cores/32 threads with 1T memory. From previous discussion, the limited memory might the problem but I think my input reads has much lower amount. Hope anyone could discuss about it.

Hi, from what I see from the command you are trying to use a compressed fastq file, which RATTLE doesn't support as of now. You will need to uncompress it first.

Also, be sure to filter out small reads (we usually filter out those smaller than 150bp).

Best,
Ivan

Thanks for your rapid response. I tried it with an uncompressed and filtered file again but the result is the same. Another update is that when I adopt the split.fastq file as rattle input it can process but there is nothing in the output. Should I offer the fastq file to help figure out the problem?
unnamed
image

Best wishes

Hi! Yes, please send me the fastq file if that's possible to ivan.delarubia@upf.edu

I encounter the same issuses.
image

I have CPU 128 with 1T memory, but I still can not run a command. Could you help me?

Hi there,

Do you run into any problems when using RATTLE with the example toyset dataset? Can you please check whether your reads contain any invalid bases? RATTLE could run into this issue when generating kmers with reads containing invalid bases.

Hope this helps,
Eileen

I did not encounter any issues while using RATTLE with the example toyset dataset. I utilized fastp to filter out low-quality bases. However, both the original fastq file (20G) and the trim.fastq file (12G) faced the same problems when using RATTLE. What does 'invalid bases' refer to? Does it mean base 'N'?
image

Hi there,

Do you run into any problems when using RATTLE with the example toyset dataset? Can you please check whether your reads contain any invalid bases? RATTLE could run into this issue when generating kmers with reads containing invalid bases.

Hope this helps, Eileen

I encountered no issues while utilizing RATTLE with the toyset dataset as an example. To eliminate low-quality bases, I employed fastp. Nevertheless, both the initial fastq file (20G) and the trim.fastq file (12G) encountered identical problems during the application of RATTLE. Additionally, upon counting the bases in my fastq file, I did not observed the presence of the "N" base. Could you help me check this issue? :)

Hi,

Valid bases are A,T,C,G,U. All other bases in reads are considered as invalid, including 'N'. No need to worry about 'N', RATTLE will filter it out.

Could you please provide your RATTLE command? If it is possible, can you please run RATTLE with your dataset with '--verbose' flag and provide the progress bar? This could provide me more information and identify why and where RATTLE went wrong.

Thanks,
Eileen

Dear Eileen,
Thank you for your assistance! I have retried using fastp to filter the data. Fortunately, I successfully obtained the desired result when I run the same commands. However, when attempting to process the original fastq file with and applied the command you suggested, I encountered difficulties and couldn't understand the cause. It is possible that my original fastq file contains duplicate bases and low-quality bases, which could be the reason for the issues. The following figures is my command and error result.
image
image

Do you have very short or extremely long reads in your input? E.

On Wed, 7 Jun 2023 at 12:51, Wu Ziwei @.> wrote: Hi there, Do you run into any problems when using RATTLE with the example toyset dataset? Can you please check whether your reads contain any invalid bases? RATTLE could run into this issue when generating kmers with reads containing invalid bases. Hope this helps, Eileen I encountered no issues while utilizing RATTLE with the toyset dataset as an example. To eliminate low-quality bases, I employed fastp. Nevertheless, both the initial fastq file (20G) and the trim.fastq file (12G) encountered identical problems during the application of RATTLE. Additionally, upon counting the bases in my fastq file, I did not observed the presence of the "N" base. Could you help me check this issue? :) — Reply to this email directly, view it on GitHub <#26 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKBYQE3BREWONVKUBFRDXJ7UBTANCNFSM46HK3SQQ . You are receiving this because you are subscribed to this thread.Message ID: @.>

Yes, you might be correct. I checked my fastq file and confirmed that it contains reads longer than 150. However, I neglected to determine the length of the longest reads.

Facing this issue:==========================================
SLURM_JOB_ID = 2797830
SLURM_NODELIST = hm02

Starting at Tue May 21 18:49:46 CDT 2024
Job name: Rattle, Job ID: 2797830
I have 4 CPUs on compute node hm02
RNA mode: true
Reading fasta file...
Reads: 10527128
Done
/var/spool/slurmd/job2797830/slurm_script: line 82: 2113221 Killed ./rattle cluster -i "$filtered_file" -o "$output_folder" --rna -B 0.5 -b 0.3 -f 0.2
Reading fasta file... Done
Reading fasta file... Done
Using bigmem partition of our cluster:
$ slurminfo
QUEUE FREE TOTAL FREE TOTAL RESORC OTHER MAXJOBTIME CORES NODE GPU
PARTITION CORES CORES NODES NODES PENDING PENDING DAY-HR:MN /NODE MEM-GB (COUNT)

  bigmem     48     96      0      2        0        0     7-00:00       48    1500 -         
    My slurm looks like this:

echo "Starting at $(date)"

echo "Job name: ${SLURM_JOB_NAME}, Job ID: ${SLURM_JOB_ID}"

echo "I have ${SLURM_CPUS_ON_NODE} CPUs on compute node $(hostname -s)"

Navigate to Porechop directory

cd /home/rkumar/Porechop || exit

Define input and output paths

input_file="/scratch/g/........./Nanopore_cDNA/A3HE/A3HE.fastq"

output_folder="/scratch/g/...../Nanopore_cDNA/A3HE/Rattle_A3HE"

Check if input file exists

if [ ! -f "$input_file" ]; then

echo "Error: Input file not found!"

exit 1

fi

Create output directory if it does not exist

mkdir -p "$output_folder/clusters"

Step 1: Filter reads by length (if needed, adjust according to your data)

filtered_file="${input_file%.fastq}_filtered.fastq"

porechop -i "$input_file" -o "$filtered_file" --discard_middle --min_split_read_size 150

Check if filtered file was created

if [ ! -f "$filtered_file" ]; then

echo "Error: Filtered file not created!"

exit 1

fi

Navigate to Rattle directory

cd /home/rkumar/RATTLE || exit

Step 2: Run the RATTLE commands

./rattle cluster -i "$filtered_file" -o "$output_folder" --rna -B 0.5 -b 0.3 -f 0.2

./rattle cluster_summary -i "$filtered_file" -c "$output_folder/clusters.out" > "$output_folder/cluster_summary.tsv"

./rattle extract_clusters -i "$filtered_file" -c "$output_folder/clusters.out" -o "$output_folder/clusters" --fastq

Step 3: Correct reads

./rattle correct -i "$filtered_file" -c "$output_folder/clusters.out" -o "$output_folder"

Step 4: Merge consensi files and run polishing step

consensi_file="$output_folder/consensi.fq"

cat "$output_folder"/*/consensi.fq > "$consensi_file"

Check if consensi file was created

if [ ! -f "$consensi_file" ]; then

echo "Error: Consensi file not created!"

exit 1

fi

./rattle polish -i "$consensi_file" -o "$output_folder" --rna

echo "Finished at $(date)"

Periodically log memory usage

while true; do

echo "Memory usage at $(date):"

free -h

sleep 600  # Log every 10 minutes

done &