nanoporetech/remora

Can remora infer take a directory of pod5s?

lkwhite opened this issue · 5 comments

What I'm running:

remora infer from_pod5_and_bam pod5_pass supv5pseUalignedwithmvtables.bam --reference-anchored --model /path/to/remora_env/lib/python3.11/site-packages/remora/trained_mo dels/rna004_130bps_sup@v5.0.0_pseU@v1.pt --out-bam remorainfer.bam

If I point it to my directory of pod5s as above, I get FileNotFoundError(f"Failed to open pod5 file at: {path}")

If I concatenate my pod5s into one and pass that directly, I get RuntimeError: Invalid: Not an Arrow file

In classic "describing the issue prompts the user to find a solution" fashion, I thought about this a bit more and remembered fast5s were cat-able but pod5s need pod5 merge. So this is resolved, though being able to point it to a directory of many pod5s would be nice.

A directory of pod5s as input to remora infer is supported. Can you post the full error here?

remorainfer8802960.errout :::::::::::::: Indexing BAM by read id: 26901 Reads [00:00, 45180.70 Reads/s] Indexing BAM by read id: 59751 Reads [00:01, 46670.75 Reads/s] Indexing BAM by read id: 92601 Reads [00:01, 48449.10 Reads/s] Indexing BAM by read id: 125451 Reads [00:02, 47245.34 Reads/s] Indexing BAM by read id: 151583 Reads [00:03, 48085.29 Reads/s] Traceback (most recent call last): File "/cluster/software/modules-python/python/3.8.5/bin/remora", line 8, in <module> sys.exit(run()) File "/cluster/software/modules-python/python/3.8.5/lib/python3.8/site-packages/remora/main.p y", line 71, in run cmd_func(args) File "/cluster/software/modules-python/python/3.8.5/lib/python3.8/site-packages/remora/parser s.py", line 1129, in run_infer_from_pod5_and_bam infer_from_pod5_and_bam( File "/cluster/software/modules-python/python/3.8.5/lib/python3.8/site-packages/remora/infere nce.py", line 299, in infer_from_pod5_and_bam with pod5.Reader(Path(pod5_path)) as pod5_fh: File "/beevol/home/whitel/.local/lib/python3.8/site-packages/pod5/reader.py", line 655, in __ init__ ) = self._open_arrow_table_handles(self._path) File "/beevol/home/whitel/.local/lib/python3.8/site-packages/pod5/reader.py", line 689, in _o pen_arrow_table_handles raise FileNotFoundError(f"Failed to open pod5 file at: {path}") FileNotFoundError: Failed to open pod5 file at: /beevol/home/whitel/tRNAworkshop/livebasecalled /WTyeast004_20231111_1104_P2S-00519-B_PAQ47538_fa3726ec/pod5_pass

aaaaaand now I see the typo

Or actually maybe not? There's a space here
/beevol/home/whitel/tRNAworkshop/livebasecalled /WTyeast004_20231111_1104_P2S-00519-B_PAQ47538_fa3726ec/pod5_pass
but this is run out of that run directory, so I'm only passing pod5_pass in my script. not sure how that is getting introduced