0xTCG/biser

mismatch rate

hrrsjeong opened this issue · 1 comments

Hello @inumanag
I have a quick question regarding the optional field from the biser output. Is mismatch rate (start with X=) computed by mismatch count / (match count + mismatch count) or mismatch count / aln_len (including gaps). Similarly, could you please let me know you used the same denominator when you compute the mismatch rate (X=) and gap rate (ID=)? I assume they are similar to 1 - fracMatch and 1 - fracMatchIndel in SEDEF, but it would be great if you confirm this.

Also, I would like to compare the results between biser and sedef. I found that biser output doesn't provide alnB, matchB, mismatchB in sedef output. I was wondering if there's a way to extract this information. I tried using sum of length of M in CIGAR string divided by mismatch, but again how to compute the mismatch rate (X=) in the optional field wasn't clear. Thanks so much!!

Here are the details:

def mis_err(self):

so it should be aln_len.

I cannot recall exactly how SEDEF generated that info. The exact procedure that does it can be found here:
https://github.com/vpc-ccg/sedef/blob/5acd139436fe823302f27b7a526299da6db8034f/src/stats_main.cc#L245

For mismatchB you need to parse CIGAR with the sequences themselves (as it does not include X= but just M).