qurator-spk/dinglehopper

Review error rate definitions etc.

mikegerber opened this issue · 2 comments

Review error rate definitions etc.

I suggest to implement alignment path length as denominator instead of the GT length (which can be >1):

if d == 0:
return 0, n
if n == 0:
return float("inf"), n
return d / n, n

(Ideally, you implement all 3 length options: alignment path, maximum sequence, GT sequence.)

The problem for dinglehopper is that your levenshtein_matrix does not give you the alignment path, you only have the resulting minimum distance.

Update: I recommend using rapidfuzz's normalized_distance instead of just dividing distance by the GT length. Internally (in the CPP backend) the denominator is calculated as the actual path length (=maximum distance).