Should do some sort of automated testing on scraped schedules
davidschlachter opened this issue · 2 comments
Course updates frequently break in ways that are currently only detected by either my very incomplete manual testing, or by bug reports from users. If there were some way that we could do automated testing to validate scraped schedules, this would improve the program's reliability for users.
However, the main challenge would be to validate schedules without scraping validation data with the same system scraping the test data.
One possible solution could be to detect situations that are known to cause silent failures. For example, if two sections have the following activites:
- Section A: two lectures, three DGDs
- Section B: one DGD
This would indicate a scraping error in determining sections from the uOttawa data and will cause silent failures for users. This type of situation could be automatically flagged and trigger a notification that scraping has had some failures. These types of errors should be tested for and raised in the scraping program since this is now the most common failure point for users of this project.
If this is implemented by somehow comparing scraped schedules to reference schedules, then the courses most commonly searched for would probably be the best candidates. Here are the top ones from April 2018 – April 2020 (with the number of times searched):
496 CEG2136
451 CSI2110
411 SEG2105
392 ENG1112
371 MAT1320
331 MAT1322
331 CHM1311
321 MAT1341
316 ITI1120
316 ECO1104
285 MAT2377
260 PHI1101
239 ECO1102
214 ENG1100
211 ADM1340
205 MAT1348
204 ITI1121
194 ITI1100
192 ECO1504
190 PSY1102
190 ECO1502
189 CSI2132
177 PSY1101
172 MAT1300
172 CSI2101
Implemented in 58a1c38 after email conversation with uschedule.me team. Each time schedules are updated, the test results will be available at https://schlachter.ca/schedgen/latest-unit-test-results.txt.