hasindu2008/slow5tools

slow5 files indexing problem in f5c

Closed this issue · 17 comments

Dear slow5tools (@hasindu2008)
I've used slow5tools to convert my single large fast5 files into .blow format.
Now, I try to run the f5c index but getting below error. Could you please suggest way of dealing this? Many thanks Reza
image

Hi

Could you try running
slow5tools index /path/to/blow5/file on that blow5 file and see if that also give a similar error?

Hello @hasindu2008
Thanks for your prompt reply. I tried the suggested way and obtained similar error as past.Please see the screenshot below.
image

Oh, I guess I know why. This is probably an unhandled error with respect to the permission in #88.
Could you please check if the directory in which the BLOW5 file is has writeable permission?
You can do something like touch /scratch/project/..../kd1_1.slow5/a.txt to see if it has writable permission.

Thanks a lot. You are correct I could not write file because the disk quota was exceeded. So, cleared some files. Then submitted the below job-

slow5tools index /scratch/project_mnt/S0077/ONT_Reza/KD1_1/slow5/kd1_1.slow5/FAS78697_pass_d1f51769_1364a33e_0.blow5

A new file blow5.idx has been generated. please see screenshot. Could you please suggest whether I can now run eventalign in f5c on this index file? I'm interested in differential RNA modification analysis so I need eventalign file to run nanocomopore or xpore later. I do highly appreciate your time.

image

Now you can continue to run the f5c index as before. Given slow5 index is now there, you can optionally provide --skip-slow5-idx flag to f5c index to make it further faster.

Once that is successful you can either use f5c eventalingn with --rna if directRNA or nanopolish eventalign. Both should give same answers.
Some examples at https://hasindu2008.github.io/slow5tools/workflows.html could be helpful. Feel free to ask if you encounter an issue.

Thanks a lot. Your help is highly appreciated. I'll keep you posted update on indexing using f5c and then f5c eventalign. f5c is very fatser than nanopolish as I tested on a few public datasets.

Hi
f5c index on blow file generated three files in directory 'fastq_dir2' (please see screenshot)- basecalled.fastq.index, basecalled.fastq.index.fai, basecalled.fastq.index.gzi. Does it require to generate basecalled.fastq.index.readdb?

image

I think the run is successful I don't see any error. I've attached a below screenshot for your consideration.
image

seems it has been successful. Give the eventalign a go.
f5c index for slow5 does not create a readdb as the path to slow5 file is suppose to be given to other commands using the --slow5 arguments.

if you want to use nanopolish on slow5, you can manually create the readbb files as below:
echo -e "*\tpath/to/reads.blow5" > basecalled.fastq.index.readdb

Thank you so much for your quick response and guideline. Another question, please.
Do I need to do transcriptome mapping using minimap2 on the .blow5 file (fast5) or I can use the bam file generated on the previous fast5 before converting to blow5)? thanks a lot again.

The already generated BAM file will work as long as the read IDs in the BAM file are same as on the FASTQ that you input to f5c index.

Alternatively, you can map the FASTQ file (not BLOW5 or FAST5) using Minimap2. FAST5 or BLOW5 files are not directly usable mu MInimap2 unless you basecall them (you can directly basecall BLOW5 using https://github.com/Psy-Fer/buttery-eel/ with Guppy).

When running the f5c eventalign I'm getting an error-
image
I used the below minimap2 command-

module load samtools

./minimap2 -ax map-ont -t 8 -uf -k14 -d /scratch/project_mnt/S0077/ONT_Reza/annotations/Mus_musculus.GRCm39.cdna.all.fa.mmi
/scratch/project_mnt/S0077/ONT_Reza/KD1_1/KD1_1_fast5_pass_gzip/fastq_pass/FAS78697_pass_d1f51769_1364a33e_0.fastq > /scratch/project_mnt/S0077/ONT_Reza/KD1_1/minimap2/aligned.sam

samtools view -Sb /scratch/project_mnt/S0077/ONT_Reza/KD1_1/minimap2/aligned.sam | samtools sort -o /scratch/project_mnt/S0077/ONT_Reza/KD1_1/minimap2/aligned.bam - &>> /scratch/project_mnt/S0077/ONT_Reza/KD1_1/minimap2/aligned.bam.log

samtools index /scratch/project_mnt/S0077/ONT_Reza/KD1_1/minimap2/aligned.bam &>> /scratch/project_mnt/S0077/ONT_Reza/KD1_1/minimap2/aligned.bam.log

The fastq file I used for minimap2 was basecalled by MinKnow live base caller.
Any idea of getting rid of this problem?
Thank you so much.

Are there any warnings or errors in aligned.bam.log for samtools index command?
Run samtools idxstats on the BAM file and see to verify if something is not right with the BAM file?

Hi Hasindu,

I'm trying to run the minimap2 freshly as the past run was not okay.

Please find my script for minimap2 in this google drive link: https://drive.google.com/drive/folders/1adRWqdWoSC-sbvb9Za47zp6ZgX2BUk4_?usp=share_link

I'm not getting any bam file and the error file gives me the error which is given below:
[M::mm_idx_gen::3.6401.46] collected minimizers
[M::mm_idx_gen::4.456
2.34] sorted minimizers
[M::main::4.4632.34] loaded/built the index for 116912 target sequence(s)
[M::mm_mapopt_update::4.693
2.27] mid_occ = 101
[M::mm_idx_stat] kmer size: 14; skip: 5; is_hpc: 0; #seq: 116912
[M::mm_idx_stat::4.8802.22] distinct minimizers: 20376920 (35.46% are singletons); average occurrences: 3.560; average spacing: 2.977; total length: 215951816
[M::worker_pipeline::21.000
12.71] mapped 251671 sequences
[M::main] Version: 2.24-r1122
[M::main] CMD: ./minimap2 -t 20 -ax splice -uf -k14 /scratch/project_mnt/S0077/ONT_Reza/annotations/Mus_musculus.GRCm39.cdna.all.fa /scratch/project_mnt/S0077/ONT_Reza/KD1_1/fastq_dir/pass/basecalled.fastq
[M::main] Real time: 21.075 sec; CPU: 266.960 sec; Peak RSS: 4.100 GB
samtools sort: failed to read header from "-"
[E::hts_open_format] Failed to open file "/scratch/project_mnt/S0077/ONT_Reza/KD1_1/minimap2/aln.sorted.bam" : No such file or directory
samtools index: failed to open "/scratch/project_mnt/S0077/ONT_Reza/KD1_1/minimap2/aln.sorted.bam": No such file or directory

I'll be very grateful if you kindly let me know where the problem is in the script.

Best regards
Reza

You must provide aln.sam as an argument to samtools, for instance:
samtools sort aln.sam -o /scratch/project_mnt/S0077/ONT_Reza/KD1_1/minimap2/aln.sorted.bam

Thank you so much. It worked well.

f5c eventalign also ran well on now. I'll do the same for other samples. Thank you so much for your big help in overcoming the hurdles. I'm grateful to you.
Best regards
Reza