nanoporetech/remora

raw signal to reference sequence?

zhongzhd opened this issue · 3 comments

Hello, we know that the sequence basecalled from raw signal is bound to have mismatches, insertions or deletions compared to the reference sequence. So how to allocate the signal with respect to the reference sequence?

The default for remora dataset prepare is to anchor to reference sequence. The --basecall-anchor argument will produce training chunks anchored to basecalls.

Sorry for any potential confusion in my previous statement. What I meant is, based on the principle of basecalling, i.e., deriving the sequence of reads from raw electrical signal, conversely, it is also possible to obtain the signal corresponding to each base (kmer). However, due to the fact that the obtained reads sequence often does not perfectly match the reference sequence (due to mismatches, insertions, deletions, etc.), I am interested in understanding the principles behind allocating the raw electrical signal to each base in the reference sequence (similar to the 'resquiggle' step in Tombo and the 'eventalign' step in Nanopolish). Thank you very much!

The notebooks section of this repository go into detail describing this procedure in Remora. If you have specific questions after reviewing this material please post them here.