Writing SCTs for PySpark courses
filipsch opened this issue · 4 comments
Written up by @nicksolomon.
Writing these SCTs is complicated by the need to use the SingleProcessExercise
type. This prevents the use of common pythonwhat
idioms. For example, it's impossible to use Ex().check_object("obj").has_equal_value()
. There isn't a solution process to compare the student's result to.
In the only launched PySpark course, every SCT uses has_equal_ast()
with the relevant code. However, this isn't very robust in a lot of cases. Often there are multiple ways to do the same thing, and both are equally acceptable. For example (much like in pandas
) it's possible to access a column of a Spark DataFrame by typing df.col
or df["col"]
. These are equivalent when evaluated, but do not have the same AST, so the SCT fails when one is expected and the student provides the other. It would be ideal if this were something we were able to work around.
Additionally, being unable to check objects makes things difficult. This makes check_correct()
almost useless, because the idiom
Ex().check_correct(
check_object("obj").has_equal_value()m
check_function("obj_creator").check_args(0).has_equal_value()
)
can't be used. This is very useful when there are multiple, equivalent, ways of creating an object. This further reduces the flexibility of the SCT.
As it stands, only SCTs like has_equal_ast()
and has_code()
are available, along with logical operators on them (like check_or()
). This means that building flexible SCTs means coming up with a gargantuan and sprawling call to test_or
that's difficult to reason about and understand.
@filipsch, thanks for porting this to from Slack to GH. I'm looping myself into this issue since I'm a main stakeholder in this.
cc: @yashasroy
What is the timeline on this? I'm afraid there is no easy solution here, and it's going to take a long time to improve this. There is no time left in my planning (and it's not included in my OKRs) to work on this this quarter.
One pyspark course is expected to launch in the next ~3 weeks. The other course in dev will take a bit more time, I don't have a good estimate yet.
I've had a call with @adrian-datacamp and this should all be better in v2.15.3. If there are issues with this, please create a new issue in this repo.