Unsuccessful resquiggle for RNA

Question

Unsuccessful resquiggle for RNA

vgori opened this issue a year ago · 4 comments

Hi all,

I have sequencing data from the ONT PromethION platform with the Kit SQK-RNA002 and R9 chemistry.
Tombo pipeline has been launched under the conda environment with the following version of tools:
numpy=1.19.5=py37h3e96413_3
python=3.7.12=hf930737_100_cpython
mappy=2.22=py37h190e900_0 1
minimap2=2.22=h5bf99c6_0
which should correspond to the Tombo requirements.

I`ve tried to run resquiggle several ways:

with cDNA human reference genome directly:

tombo resquiggle /path/to/SingleFast5s/
/path/to/Homo_sapiens.GRCh38.cdna.all.fa
--rna
--processes 18
--overwrite
--num-most-common-errors 5

with the previously created index.mmi file:
tombo resquiggle /path/to/SingleFast5s/
/path/to/index_fast5.mmi
--rna
--processes 18
--overwrite
--num-most-common-errors 5
Both ways return to me the result:
94.2% reads unsuccessfully processed

through previously created aligned .sam file:

minimap2 -ax splice -uf -k14 /path/to/index_fast5.mmi
/path/to/fastq_single_ControlA.fastq | samtools sort
-o /path/to/aligned_ControlA.sam

tombo resquiggle /path/to/SingleFast5s/
/path/to/aligned_ControlA.sam
--rna
--processes 18
--overwrite
--num-most-common-errors 5

This way returns me the result with error U bases and 100% reads unsuccessfully processed

Meantime checking aligned_ControlA.sam file with the samtools shows 10195 mapped reads from 15579, which means 65% should be successfully aligned.

Am I missing something? There are issues with the tool?

Answer 1 · 2023-11-13T16:38:31.000Z

I'm trying to a similar analysis using data generated using the RNA002 kit. I'm still stuck on the preprocess step of adding the fastq sequence to the fast5 files. If I get to the resquiggle step, I will come back and post an update.

Answer 2 · 2023-11-20T07:51:37.000Z

Hi @marcus1487,

I got similar problem with running tombo resquiggle:

tombo resquiggle --rna --overwrite --processes 20 fast5_pass/0/ /mnt/dataHDD/wanyk/xpore_related/background_JSON/diffmod_outputs/Homo_sapiens.GRCh38.cdna.all.fa --num-most-common-errors 5
[07:49:24] Loading minimap2 reference.
[07:49:30] Getting file list.
[07:49:31] Re-squiggling reads (raw signal to genomic sequence alignment).
5 most common unsuccessful read types (approx. %):
    93.2% (   3726 reads) : Unexpected error                                                                
     6.8% (    274 reads) : Alignment not produced                                                          
     -----
     -----
     -----
100%|█████████████████████████████████████████████████████████████████████████████████████| 4000/4000 [00:20<00:00, 198.70it/s]
******************** WARNING ********************
        Unexpected errors occured. See full error stack traces for first (up to) 50 errors in "unexpected_tombo_errors.8338.err"
[07:49:51] Final unsuccessful reads summary (100.0% reads unsuccessfully processed; 4000 total reads):
    93.2% (   3726 reads) : Unexpected error                                                                
     6.8% (    274 reads) : Alignment not produced                                                          
[07:49:51] Saving Tombo reads index to file.

tombo version: 1.5.1
numpy version: 1.19.0

tombo preprocess seems fine:

tombo preprocess annotate_raw_with_fastqs --fast5-basedir fast5_pass/0/ --fastq-filenames sample.fastq --overwrite
[07:44:58] Preparing reads and extracting read identifiers.
/home/wanyk/.local/lib/python3.8/site-packages/tombo/_preprocess.py:378: H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) in h5py 3.0. To suppress this warning, pass the mode you need to h5py.File(), or set the global default h5.get_config().default_file_mode, or set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are: 'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details.
  with h5py.File(fast5_fn) as fast5_data:
100%|█████████████████████████████████████| 4000/4000 [00:02<00:00, 1632.85it/s]
[07:45:01] Annotating FAST5s with sequence from FASTQs.
****** WARNING ****** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files.
100%|██████████████████████████████████████| 4000/4000 [00:23<00:00, 172.13it/s]
[07:45:24] Added sequences to a total of 4000 reads.

Thanks!

Best wishes,
Yuk Kei

Answer 3 · 2023-12-14T16:15:07.000Z

Hi @marcus1487,

I got similar problem with running tombo resquiggle:

tombo resquiggle --rna --overwrite --processes 20 fast5_pass/0/ /mnt/dataHDD/wanyk/xpore_related/background_JSON/diffmod_outputs/Homo_sapiens.GRCh38.cdna.all.fa --num-most-common-errors 5
[07:49:24] Loading minimap2 reference.
[07:49:30] Getting file list.
[07:49:31] Re-squiggling reads (raw signal to genomic sequence alignment).
5 most common unsuccessful read types (approx. %):
    93.2% (   3726 reads) : Unexpected error                                                                
     6.8% (    274 reads) : Alignment not produced                                                          
     -----
     -----
     -----
100%|█████████████████████████████████████████████████████████████████████████████████████| 4000/4000 [00:20<00:00, 198.70it/s]
******************** WARNING ********************
        Unexpected errors occured. See full error stack traces for first (up to) 50 errors in "unexpected_tombo_errors.8338.err"
[07:49:51] Final unsuccessful reads summary (100.0% reads unsuccessfully processed; 4000 total reads):
    93.2% (   3726 reads) : Unexpected error                                                                
     6.8% (    274 reads) : Alignment not produced                                                          
[07:49:51] Saving Tombo reads index to file.

tombo version: 1.5.1 numpy version: 1.19.0

tombo preprocess seems fine:

tombo preprocess annotate_raw_with_fastqs --fast5-basedir fast5_pass/0/ --fastq-filenames sample.fastq --overwrite
[07:44:58] Preparing reads and extracting read identifiers.
/home/wanyk/.local/lib/python3.8/site-packages/tombo/_preprocess.py:378: H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) in h5py 3.0. To suppress this warning, pass the mode you need to h5py.File(), or set the global default h5.get_config().default_file_mode, or set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are: 'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details.
  with h5py.File(fast5_fn) as fast5_data:
100%|█████████████████████████████████████| 4000/4000 [00:02<00:00, 1632.85it/s]
[07:45:01] Annotating FAST5s with sequence from FASTQs.
****** WARNING ****** Some FASTQ records contain read identifiers not found in any FAST5 files or sequencing summary files.
100%|██████████████████████████████████████| 4000/4000 [00:23<00:00, 172.13it/s]
[07:45:24] Added sequences to a total of 4000 reads.

Thanks!

Best wishes, Yuk Kei

Hi Yuk Kei!

I just met the same issue as you did. I wonder if you find the solution for this problem?

Thanks.

Answer 4 · 2023-12-14T17:26:07.000Z

Tombo is deprecated and as such the issues here will not be addressed. We have released full support for signal analysis of RNA reads in Remora. I would strongly suggest you move your pipelines to the Remora signal analysis API.