facebookresearch/dynabench

Export bug with a large number of validations

TristanThrush opened this issue · 0 comments

I just found a bug that was adding noise to the winoground validations from mturk. Somewhere in the dynabench export pipeline, there is a group concat that returns validation ids for an example. The max number of characters for a group concat is about 1000 by default! The annoying thing about this bug is that it wasn't crashing. Because the list of validation ids was truncated, it would look like something like this: "323830,328354,328784,330398,333656,334071,337565,337819,342201,345581,345797,347630,348000,348410,349931,351112,355809,356800,358430,359029,364692,368108,368600,370556,371270,373622,376187,377057,378013,380498,381469,382702,387589,388013,388703,389363,3903". That little 3903 at the end of the list is the validation id for a hate speech validation.