alexander-bauer/swirlypy

About testing user responses

Closed this issue · 5 comments

R swirl's default, omnitest, was a stopgap measure which became a de facto standard. It is very limited and best not repeated. For an initial prototype, we should do a few simple tests, but be careful not to circumscribe "long tail" cases as described below.

Testing user responses resembles unit testing. To a first approximation, there are two objects to test, the expression which a user entered and the result of its evaluation.

IMO, many, many cases would be covered simply by comparing the user's result with a precomputed, correct result. Since, in a valid lesson, correct answers must pass their associated tests (ht @reginaastri,) a procedure to generate a dictionary of correct results could also serve as a validity check on lessons.

When expressions have side effects (e.g., plots,) but no direct effects such as return values, the user's expression must be checked instead. Insisting that the user's expression essentially match the instructor's preconception is too restrictive. R swirl users complain about it all the time. I believe regular expressions would cover most cases we've seen, e.g., testing if a particular function was used. Functions to construct common regex's could be provided. R swirl, for instance, provides the equivalent of regex | with the test any_of_exprs(...).

Simple checks like matching a correct result or regular expression would handle 80-90% of demand, and would surely be fine for a first prototype. However, there is a long and interesting tail. A trivial case is when a user is asked to generate 100 random numbers. There's no correct result for that question or for any subsequent questions which depend on it. Daphne Koller (Coursera's cofounder) gave the impressive example of a simple test of color balance which suggested a whole class of image processing questions suitable for MOOCs.

Eventually, custom tests should be accommodated. At this stage, I just want to scope the issue and leave the door open.

It may be worthwhile to design a pluggable answer system from the get-go. Then, we can design our default tests around that, and provide all necessary tools to others who might write custom tests.

+1 for above idea.

Tests would be one suite of objects, and would need references to user responses, which would be captured by another suite of objects. This hand-off is so basic that some base control class, perhaps abstract, should support it.

I believe we've decided that questions are the main type of code we need as plugins, and that question types are responsible for their own response testing.

I agree.