Simplify test data hierarchy

Question

Simplify test data hierarchy

niemela opened this issue 3 months ago · 5 comments

Currently the test data is organized in a "tree-like structure" of any shape and depth. All test cases are given as .in files at the leaves, and event internal node is a "group". In addition to being the "structure" for the grading, groups are typically presented as feedback to users/contestants/judges/teachers.

That said, in practice the test data trees actually used are quite limited:

The root (/) always has exactly two children /sample and /secret. (This is specified).
/sample never has any children. I don't know of any system supporting the problem format that would handle that, and I'm not sure what should happen.
/secret almost never has has any grandchildren, and in the few cases where they do (I know of only 3) the grandchildren are only used for organization, and not for computation or results. (I.e. the end result would be the same if moving all /secret/group/subgroup/testcase.in to /secret/group/subgroup_testcase.in.)

For this reason I think we should consider the following:

Disallow children of /sample. For the reasons above.
Remove the root node. I.e. the overall result is the result of /secret. This disallows (directly) giving score for the results of /sample which is currently done for some (very few) problems. It would still be possible to do this though, but explicitly including the samples somewhere under /secret, either by copying or by symlinking.
Disallow grandchildren of /secret. Either by (a) disallowing directories completely below /secret/<group>, or (b) by simply disallowing testdata.yaml in directories below /secret/<group>, and treating all files below /secret/<group> to be in that group, (ordered by lexiographic order on the path or on the file).

1 & 2 seems like a very good idea, simplifying the test data without losing much. 3 is a little bit more controversial.

Thoughts?

Answer 1 · 2024-08-16T14:45:36.000Z

I like this and even think I suggested it during the workshop in Lund, but there was pushback I don't remember.

Answer 2 · 2024-08-16T15:04:37.000Z

One annoying detail is the for pass-fail problems you very much would prefer to not run /secret if /sample fails, but for scoring you typically(?) do. It would be a little bit annoying to have to specify this explicitly for most cases of one type of problem, but it would be a little bit ugly to have opposite defaults based on type here...

Answer 3 · 2024-08-17T08:53:23.000Z

/sample never has any children. I don't know of any system supporting the problem format that would handle that, and I'm not sure what should happen.

/secret almost never has has any grandchildren, and in the few cases where they do (I know of only 3) the grandchildren are only used for organization, and not for computation or results. (I.e. the end result would be the same if moving all /secret/group/subgroup/testcase.in to /secret/group/subgroup_testcase.in.)

Funnily enough we had EGOI problems that violated both of these. https://github.com/zehnsechs/egoi-2024-testdata/tree/main/day2/lightbulbs had one scoring testgroup that wanted to reuse test cases from another non-scoring testgroup, and doing that straight off with a scoring output validator doesn't work. As a workaround we created a nested test group for each group that had the output validator output the used number of queries as the score (with grader_flags: max), and then did the scoring using a grader instead. On Kattis it turns out nested sample test cases don't show up in the statement, but we didn't need those anyway since this was an interactive problem and we can put the .interaction files at top level. Losing the ability to do this and having to duplicate test cases wouldn't be a huge loss though. :) Most of the time when this comes up it's just a matter of re-scaling validator scores differently for different groups, which I believe the new spec has a mechanism for.

Answer 4 · 2024-08-17T09:20:21.000Z

We agreed on 1 & 2, so no nesting under sample/, and remove the root node.

For scoring, sample is ignored for determining the final score.
For pass-fail, sample cases are taken into account for the final verdict.
Required/permitted expectations still work on both sample and root.
Scoring expectations are not allowed on any of root, sample, and individual sample cases.

For now, we will also disallow grandchildren under secret, as this is almost never used. We could consider adding support for 'organisational-only' grandchildren that do not specify their own aggregation rules and inherit them, so that they may as well have been inlined one level higher.

Answer 5 · 2024-08-17T23:58:44.000Z

These changes have now been merged in.