Bugs for automated example input/output test case extraction
zfj1998 opened this issue · 1 comments
Hi there! CodeRL is a brilliant idea, thanks for the effort!
I have also dealt with the APPS dataset, and I found it hard to extract example test cases in the problem descriptions. After checking your published data, I think your extraction script will fail at some cases.
For example, there are no example test cases extracted for task 4675, 4751, and 4752. Because these problem descriptions have no ---
tags. I believe it happens for all the similar tasks:
[2303, 2365, 2466, 2467, 2468, 2469, 2470, 2627, 2628, 2629, 2630, 2631, 2632, 2633, 2634, 2635, 2636, 2637, 2638, 2639, 2647, 2685, 2703, 2708, 2745, 2746, 2747, 2748, 2882, 2883, 2884, 2885, 2887, 4109, 4479, 4480, 4533, 4534, 4535, 4536, 4658, 4659, 4660, 4661, 4662, 4663, 4664, 4665, 4666, 4667, 4668, 4669, 4670, 4671, 4672, 4673, 4674, 4675, 4751, 4752]
Besides, some tasks have non-string input/output test cases, which will not be handled by your script. For example, the extracted test cases for task 4658 is,
{"inputs": [" n = 00000010100101000001111010011100\n", " n = 11111111111111111111111111111101\n"], "outputs": [" 964176192 (00111001011110000010100101000000)\nExplanation: The input binary string 00000010100101000001111010011100 represents the unsigned integer 43261596, so return 964176192 which its binary representation is 00111001011110000010100101000000.\n", " 3221225471 (10111111111111111111111111111111)\nExplanation: The input binary string 11111111111111111111111111111101 represents the unsigned integer 4294967293, so return 3221225471 which its binary representation is 10111111111111111111111111111111. \n"]}
I believe the following tasks may have the same issue.
[2171, 2303, 2307, 2365, 2392, 2466, 2467, 2468, 2469, 2470, 2527, 2528, 2529, 2530, 2531, 2532, 2533, 2534, 2535, 2536, 2627, 2628, 2629, 2630, 2631, 2632, 2633, 2634, 2635, 2636, 2637, 2638, 2639, 2663, 2664, 2665, 2666, 2667, 2668, 2669, 2670, 2671, 2672, 2673, 2674, 2675, 2676, 2677, 2678, 2679, 2680, 2681, 2682, 2683, 2684, 2685, 2686, 2687, 2688, 2689, 2690, 2691, 2692, 2693, 2694, 2695, 2696, 2697, 2698, 2699, 2700, 2701, 2702, 2703, 2704, 2705, 2706, 2707, 2708, 2745, 2746, 2747, 2748, 2882, 2883, 2884, 2885, 2887, 2888, 4479, 4480, 4533, 4534, 4535, 4536, 4658, 4659, 4751, 4752]
@zfj1998 thank you!
Yes there are some test problems that do not have a valid set of example test cases as the current extraction code might fail in some abnormal problem descriptions (as you observed above).
While the total number of these abnormal examples are not significant (out of 5000 test samples in APPS), I agree the extraction code can be improved to get higher quality example test cases.