I (Jared Claypooole) created this repository to rebuild and document the enhancements I made to Gradescope's example python autograder implemenation.
I made these changes in their original form to fit the needs of our course while working as a teaching assistant for UCLA's Physics 180N course (Computational Physics Lab), supervised by the course instructor Josh Samani.
See also Gradescope's autograder documentation.
I originally made these enhancements in a fairly "quick and dirty" manner, with a lot of things I've since decided to clean up. I also need to keep private the exact tests we developed to (partially) grade students' code, so that these may be reused in future iterations of the course.
As such, I've created this separate public demo, so that others may adapt my autograder enhancements to their courses. I've attempted to use the following principles to improve the quality of the project:
- Write cleaner code that's easier to read and maintain
- Use automated (meta) tests to unit test the code
- Employ integration tests (in a Docker container) to simulate Gradescope's use of the autograder and make sure everything is working properly
- Simulate a review process by reviewing github pull requests and addressing issues identified there
- Include better project and code documentation for ease of use
My original changes included five major enhancements:
- pull-from-git
- Allow updates to the autograder to be dynamically pulled via git
- required-files
- Automate filename copying and checking (with feedback to students)
- time-limited
- Integrate time limits by running tests in their own processes, redirecting stdout/stderr
- fcn-tests
- Automate scanning a matrix of parameters in tests, particularly for numerical functions
- correct-with-git
- Create a framework for grader to incrementally correct students' code using git, preventing minor mistakes from disproportionately affecting grades
So far, the first two enhancements have been merged into this project, and the remaining are in development (likely with some code being pushed on branches on this repo, or even with open pull requests).
Gradescope's default autograder configuration workflow requires the grader to upload the autograder in a zip file, which Gradescope then uses to build a Docker image -- an image which will then be used to spawn an autograder instance every time the autograder is run on a students' submission. The problem is that this build process takes a few minutes, and this Docker image needs to be rebuilt every time the grader wishes to make a change to the autograder. This often creates a frustrating bottleneck in the debugging process.
With this pull-from-git enchancement,
we can circumvent the need to rebuild the autograder image
by downloading updates dynamically via git.
Every time the autograder is run on a student's assignment submission,
the autograder checks for any updates that have been pushed to the
remote repository since the autograder was configured
(i.e., since the project directory was zipped and uploaded to Gradescope's
website, and the setup script was run).
The autograder uses the settings in the project's .git
directory
to pull any such updates.
Notes:
- This type of enhancement was explicitly suggested in the Gradescope's autograder docs, and my code is partially based on their example code.
- The update process is performed
not only just before the autograder is run on a student's submission,
but also just after
the autograder is uploaded to Gradescope -- when
setup.sh
is run - The autograder runs
git reset --hard
before doing this, which is a workaround for Windows users. This means that any changes to the autograder must be committed (usinggit commit
) before being uploaded (see issue #2)- Once can disable this reset by removing or commenting out
the
git reset --hard
line fromupdate.sh
- Once can disable this reset by removing or commenting out
the
Inner workings:
- The remote repository must be specified in the project's
.git
directory - The
.git
directory retains its settings from before the project directory is zipped and uploaded to Gradescope during the "configure autograder" step- Relevant settings are current branch, that branch's upstream remote, and that remote repository's url
- The autograder will only check for updates in a remote repository that is set
as the "upstream" for the current branch
- This is most easily done by adding the
-u
flag to a typicalgit push
command -- e.g.,git push -u <remote-name> <branch-name>
- See also
git branch --set-upsteam
- This is most easily done by adding the
- The autograder needs read-access to the remote repository
in order to fetch updates
- Assuming your remote repo isn't public (or else students could get explicit access to your tests), you probably want to authenticate with an SSH key. For a repository on Github, this can either be a "deploy key" or a "machine key"
- Make sure you've set up your remote repo with an ssh url rather than an
http one
- You can verify this by running
git remote -v
and verifying that your remote has a url which looks likegit@github.com:<username>/<repo-name>.git
rather thanhttps://github.com/<username>/<repo-name>.git
- You can verify this by running
Again, this enhancement works best when all changes you've made before zipping and uploading to the "create autograder" page are committed, and pushed to the remote repo. Otherwise the updating process is likely to break, possibly in silent or unexpected ways.
Gradescope's autograder places student submissions in the
\autograder\submission
directory, but the default python implementation
runs tests from the \autograder\tests
directory,
generally requiring any files tested to be moved there.
This enhancement allows the user to simply list all files that are required in
required-files.txt
, and then the autograder automatically copies them over
and verifies their existence, with feedback for students.
Inner workings:
- A "required file" is any filename listed in
required-files.txt
- A series of unit tests within
tests/test_file_existence.py
check that each required file exists- If a file doesn't exist in a students' submission, the Gradescope web interface will show this to the student as a failed test
- Before unit tests are run on a student's submission,
required files are automatically copied
from
\autograder\submission
to\autograder\tests
, using thecopy_files.py
script
Gradescope's autograder allows individual tests to run indefinitely, but limits the total autograder runtime for an individual submission to 20 minutes, at which point the autograder crashes and the submission receives no credit.
This module allows tests to be given time limits, by running tests in their own processes. Crucially, stdout and any exceptions are redirected to give feedback to students.
We found that our tests of students' code tended to involve verifying that a long list of inputs gave the correct output. Since python's unittest framework lends itself better to more individual test cases, I created essentially a factory for test methods. The factory builds these methods primarily from a list of inputs or a dict of inputs mapped to expected outputs.
Often a student would make a relatively minor mistake that would drastically affect their autograder score. Moreover, when trying to understand students' code, it helped to have the ability to make changes and see how the test results were affected -- essentially providing a debugging process for the graders.
This was accomplished by version controlling students' submissions using git, and pulling any changes that might have been made by the grader on the remote.
- Create a branch for each student
- Create a different repo for each assignment
- Could also have named branches with assignmentname/studentname
- Stage and commit all files (and changes thereto) in the submission directory, including metadata
- Push to the remote
- Attempt to pull any changes from the remote
- This will only be successful if the student has NOT updated their submission (i.e., there was nothing to push). Otherwise nothing will be pulled.
- Any changes that are pulled will be changes the grader made to the assignment
Notes on scoring:
- Obviously correcting students' code will increase their score
- Typically the grader will want the student to partial (or zero) credit for the mistakes that were corrected
- In my implementation this was done entirely manually, in a dedicated manually graded "problem" called something like "Modifications to the autograder score"
- It would be possible for these score modifications to be done in some automated way via git (perhaps in a json file or something), but I haven't implemented anything of that sort
What the student sees:
- If the student weren't notified somehow, they wouldn't realize the code being graded by the autograder was a modified version of their submission
- I accomplished this through a printout of the git log
(which displays the commit messages)
in a dedicate autograder test
- This depends upon the grader to create descriptive commit messages detailing the changes made
First, see Gradescope's autograder documentation, which explains how to use and customize the autograder.
Then clone this repository and add to the tests
directory acording to your
needs.
Any files of the form tests/test*.py
will be searched for unit tests.
Modify required_files.txt
to list the names of files
you expect in students' submissions.
Important:
Before zipping, be sure to either git commit
all your changes
or remove the git reset --hard
line from update.sh
.
(See the pull-from-git
section above for more details.)
Finally, zip the contents of this directory and upload it to gradescope, during the "Autograder Configuration" step.
- I strongly recommend creating a separate branch
for each assignment on gradescope
- Use the
git branch
orgit checkout -b
command, or see one of the numerous online tutorials about creating a new branch in git - I originally made the mistake of keeping a separate copy of the entire autograder in a new directory for each assignment, which quickly turned into a mess when it came to keeping track of which changes were made to which assignment's autograder copy. Git branches are designed precisely to make this sort of thing much less painless
- Use the