mismatch rate
hrrsjeong opened this issue · 1 comments
Hello @inumanag
I have a quick question regarding the optional field from the biser output. Is mismatch rate (start with X=) computed by mismatch count / (match count + mismatch count)
or mismatch count / aln_len (including gaps)
. Similarly, could you please let me know you used the same denominator when you compute the mismatch rate (X=
) and gap rate (ID=
)? I assume they are similar to 1 - fracMatch
and 1 - fracMatchIndel
in SEDEF, but it would be great if you confirm this.
Also, I would like to compare the results between biser
and sedef
. I found that biser
output doesn't provide alnB
, matchB
, mismatchB
in sedef
output. I was wondering if there's a way to extract this information. I tried using sum of length of M
in CIGAR string divided by mismatch, but again how to compute the mismatch rate (X=
) in the optional field wasn't clear. Thanks so much!!
Here are the details:
Line 200 in 39df972
so it should be aln_len
.
I cannot recall exactly how SEDEF generated that info. The exact procedure that does it can be found here:
https://github.com/vpc-ccg/sedef/blob/5acd139436fe823302f27b7a526299da6db8034f/src/stats_main.cc#L245
For mismatchB
you need to parse CIGAR with the sequences themselves (as it does not include X=
but just M
).