Review error rate definitions etc.
mikegerber opened this issue · 2 comments
mikegerber commented
Review error rate definitions etc.
bertsky commented
I suggest to implement alignment path length as denominator instead of the GT length (which can be >1):
dinglehopper/qurator/dinglehopper/character_error_rate.py
Lines 24 to 28 in 2497876
(Ideally, you implement all 3 length options: alignment path, maximum sequence, GT sequence.)
The problem for dinglehopper is that your levenshtein_matrix
does not give you the alignment path, you only have the resulting minimum distance.
bertsky commented
Update: I recommend using rapidfuzz's normalized_distance
instead of just dividing distance
by the GT length. Internally (in the CPP backend) the denominator is calculated as the actual path length (=maximum distance).