nhho/sourcedownloader

A script for downloading resources from course webpages

Python

SourceDownloader

A script for downloading resources from course webpages

Environment

Python 2.7 with the following packages

beautifulsoup4
requests

Features

This script will download the resource files from the specified urls
- They will be identified by <a> HTML tag (excpet Piazza, see below)
Piazza resource page is supported in extra
- Make sure the url is https://piazza.com/XXX/YYY/ZZZ/resources
The output folders will be compared to the existing output folders (possibly generated by previous run)

Sample output

Create folder [output]
=== CSCI3130\1819-sem1 ===
39 files from https://www.cse.cuhk.edu.hk/~siuon/csci3130/
total 8.66MiB
New folder [output\CSCI3130\1819-sem1] (does not exist in folder [old_output])

Usage

Enter course info in FOLDER_URL
Customize the whitelist and blacklist of resource file extensions in SUFFIX and SUFFIX_IGNORE respectively
Customize the whitelist and blacklist of resource urls in WHITELIST and BLACKLIST respectively
If you use piazza, fill in PIAZZA_EMAIL and PIAZZA_PASSWORD