raw signal to reference sequence?

Question

raw signal to reference sequence?

zhongzhd opened this issue a year ago · 3 comments

Hello, we know that the sequence basecalled from raw signal is bound to have mismatches, insertions or deletions compared to the reference sequence. So how to allocate the signal with respect to the reference sequence?

Answer 1 · 2023-12-28T22:53:44.000Z

The default for remora dataset prepare is to anchor to reference sequence. The --basecall-anchor argument will produce training chunks anchored to basecalls.

Answer 2 · 2024-01-15T02:13:33.000Z

Sorry for any potential confusion in my previous statement. What I meant is, based on the principle of basecalling, i.e., deriving the sequence of reads from raw electrical signal, conversely, it is also possible to obtain the signal corresponding to each base (kmer). However, due to the fact that the obtained reads sequence often does not perfectly match the reference sequence (due to mismatches, insertions, deletions, etc.), I am interested in understanding the principles behind allocating the raw electrical signal to each base in the reference sequence (similar to the 'resquiggle' step in Tombo and the 'eventalign' step in Nanopolish). Thank you very much!

Answer 3 · 2024-01-19T18:15:18.000Z

The notebooks section of this repository go into detail describing this procedure in Remora. If you have specific questions after reviewing this material please post them here.