zouharvi/ptakopet

Segment is identified also by the stimulus ID

Closed this issue · 1 comments

There are only 1575 Czech segments if identified by src and tgt only. Including stimulus ID (log[7]) fixes it to the correct number of 1588.
Using the following instead should be ok (int() because stimulus IDs in the QA logfile start with zeros if shorter than 3 digits):
if int(log[7]) == s.sid and log[9] == src and log[10] == tgt:

The same issue is on the line 62.

if log[9] == src and log[10] == tgt:

I fixed it (and also one other bug, preventing it from having multiple quality annotations. However I don't see how adding another restriction (SID check) would increase the number in your case.

On old data:
Without SID check: 1052
With SID check: 1037

Fresh data (few new users, 2261 OK segments):
Without SID check: 1099
With SID check: 1082