A script for downloading resources from course webpages
Python 2.7
with the following packages
beautifulsoup4
requests
- This script will download the resource files from the specified urls
- They will be identified by
<a>
HTML tag (excpet Piazza, see below)
- They will be identified by
- Piazza resource page is supported in extra
- Make sure the url is
https://piazza.com/XXX/YYY/ZZZ/resources
- Make sure the url is
- The output folders will be compared to the existing output folders (possibly generated by previous run)
Create folder [output]
=== CSCI3130\1819-sem1 ===
39 files from https://www.cse.cuhk.edu.hk/~siuon/csci3130/
total 8.66MiB
New folder [output\CSCI3130\1819-sem1] (does not exist in folder [old_output])
- Enter course info in
FOLDER_URL
- Customize the whitelist and blacklist of resource file extensions in
SUFFIX
andSUFFIX_IGNORE
respectively - Customize the whitelist and blacklist of resource urls in
WHITELIST
andBLACKLIST
respectively - If you use piazza, fill in
PIAZZA_EMAIL
andPIAZZA_PASSWORD