/sourcedownloader

A script for downloading resources from course webpages

Primary LanguagePython

SourceDownloader

A script for downloading resources from course webpages

Environment

Python 2.7 with the following packages

  • beautifulsoup4
  • requests

Features

  1. This script will download the resource files from the specified urls
    • They will be identified by <a> HTML tag (excpet Piazza, see below)
  2. Piazza resource page is supported in extra
    • Make sure the url is https://piazza.com/XXX/YYY/ZZZ/resources
  3. The output folders will be compared to the existing output folders (possibly generated by previous run)

Sample output

Create folder [output]
=== CSCI3130\1819-sem1 ===
39 files from https://www.cse.cuhk.edu.hk/~siuon/csci3130/
total 8.66MiB
New folder [output\CSCI3130\1819-sem1] (does not exist in folder [old_output])

Usage

  1. Enter course info in FOLDER_URL
  2. Customize the whitelist and blacklist of resource file extensions in SUFFIX and SUFFIX_IGNORE respectively
  3. Customize the whitelist and blacklist of resource urls in WHITELIST and BLACKLIST respectively
  4. If you use piazza, fill in PIAZZA_EMAIL and PIAZZA_PASSWORD