mikessh/migec

Why seed and fuzzy-search in MIGEC?

chengyangit opened this issue · 0 comments

I was try test data of MISEG. The barcode.txt records the adaptor sequence + UMI (marked as N). The adapter sequences is either lower or upper cased indicating fuzzy or seed search according to the manual.

I am curious about the portion of sequence before UMI.

As far as I understand, the sequencing before UMI should be i7 index (library index). So all the sequences (around 20 bases) before UMI are i7 index? Should not the library index already be removed during demultiplex?

Why fuzzy or seed search? What is the intuitive explanation for this and what are the sequences corresponding to these two part?