novoalab/nanoRMS

Error in get_trace_for_reference_bases while running get_features.py

mnrusimh opened this issue · 2 comments

Thanks for writing this tool. However, I am having difficulties using it. I am trying to run get_features.py and I am encountering the following error.

$ per_read/get_features.py --rna --fasta reference/reference.fa --input fast5/*.fast5
[2022-02-23 03:30:10] Processing 60 file(s)... [mem: 135 MB]
Traceback (most recent call last):
File "per_read/get_features.py", line 388, in
main()
File "per_read/get_features.py", line 383, in main
bamfiles = mod_encode(o.input, o.fasta, o.threads, o.rna, o.sensitive)
File "per_read/get_features.py", line 344, in mod_encode
return list(p.starmap(process_fast5, args))
File "per_read/get_features.py", line 307, in process_fast5
tr = get_trace_for_reference_bases(a, res.read, rna) # this takes 189µs (>50%) of time!
File "per_read/get_features.py", line 258, in get_trace_for_reference_bases
s, e = move_pos[qi:qi+2]
ValueError: not enough values to unpack (expected 2, got 0)

When I attempted debugging, I noticed that there was only a single element in the array move_pos whose value was len(trace).
Also, move had a value "none" and np.argwhere(move==1).flatten() returned an empty array.

I appreciate your help in fixing this issue. Happy to provide any further info need. Thanks for your time.

Hi @mnrusimh , as you pointed out, you Fast5 are likely missing Move table. You'll likely need to basecall your Fast5 files using --fast5_out.

You should see both, Trace and Move in h5ls output

h5ls -r ~/src/nanoRMS/per_read/guppy3.0.3.hac/RNA235629_WT30C/workspace/batch0.fast5|head
/                        Group
/read_00007047-b8c6-44cb-a47c-9f2f4cd38cc7 Group
/read_00007047-b8c6-44cb-a47c-9f2f4cd38cc7/Analyses Group
/read_00007047-b8c6-44cb-a47c-9f2f4cd38cc7/Analyses/Basecall_1D_000 Group
/read_00007047-b8c6-44cb-a47c-9f2f4cd38cc7/Analyses/Basecall_1D_000/BaseCalled_template Group
/read_00007047-b8c6-44cb-a47c-9f2f4cd38cc7/Analyses/Basecall_1D_000/BaseCalled_template/Fastq Dataset {SCALAR}
/read_00007047-b8c6-44cb-a47c-9f2f4cd38cc7/Analyses/Basecall_1D_000/BaseCalled_template/Move Dataset {1980}
/read_00007047-b8c6-44cb-a47c-9f2f4cd38cc7/Analyses/Basecall_1D_000/BaseCalled_template/Trace Dataset {1980, 8}

I have added further warning to the program if Move table is missing.

Let me know if that solves the problem.

Hi @lpryszcz, rerunning the base calling with --fast5_out fixed it. Thanks for the help.