The number of examples in the out-of-domain dev set are not accurate
maxsonate opened this issue · 1 comments
maxsonate commented
For example:
DROP: 1557 vs 1,503
DuoRC.ParaphraseRC: 1648 vs 1,501
ajfisch commented
Hi,
How are you counting?
If you count using this script you should get the right numbers.
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import sys
import json
import gzip
examples = 0
fname = sys.argv[1]
with gzip.open(fname, 'rb') as f:
for i, line in enumerate(f):
obj = json.loads(line)
if i == 0 and 'header' in obj:
continue
examples += len(obj['qas'])
print('Num examples: %d' % examples)