/IliasDownloaderUniMA

📚 A simple python package for downloading files from ilias.uni-mannheim.de

Primary LanguagePythonMIT LicenseMIT

Ilias Downloader UniMA

CodeFactor Grade example branch parameter PyPI pyversions PyPI version PyPI downloads total

A simple python package for downloading files from https://ilias.uni-mannheim.de.

  • Automatically synchronizes all files for each download. Only new or updated files and videos will be downloaded.
  • Uses the BeautifulSoup package for scraping and the multiprocessing package to accelerate the download.

Install

Easy way via pip:

pip3 install iliasDownloaderUniMA

Otherwise you can clone or download this repo and then run

python3 setup.py install 

inside the repo directory.

Usage

Starting from version 0.5.0, only your uni_id and your password is required. In general, a simple download script to download all files of the current semester looks like this:

from IliasDownloaderUniMA import IliasDownloaderUniMA

m = IliasDownloaderUniMA()
m.setParam('download_path', '/path/where/you/want/your/files/')
m.login('your_uni_id', 'your_password')
m.addAllSemesterCourses()
m.downloadAllFiles()

The method addAllSemesterCourses() adds all courses of the current semester by default. However, it's possible to modify the search behaviour by passing a regex pattern for semester_pattern. Here are some examples:

# Add all courses from your ilias main page from year 2020:
m.addAllSemesterCourses(semester_pattern=r"\([A-Z]{2,3} 2020\)")
# Add all FSS/ST courses from your ilias main page:
m.addAllSemesterCourses(semester_pattern=r"\((FSS|ST) \d{4}\)")
# Add all HWS/WT courses from your ilias main page:
m.addAllSemesterCourses(semester_pattern=r"\((HWS|WT) \d{4}\)")
# Add all courses from your ilias main page. Even non-regular semester
# courses like 'License Information (Student University of Mannheim)',
# i.e. courses without a semester inside the course name:
m.addAllSemesterCourses(semester_pattern=r"\(.*\)")

You can also exclude courses by passing a list of the corresponding ilias ref ids you want to exclude:

# Add all courses from your ilias main page. Even non-regular semester
# courses. Except the courses with the ref id 954265 or 965389.
m.addAllSemesterCourses(semester_pattern=r"\(.*\)", exclude_ids=[954265, 965389])

A more specific example:

from IliasDownloaderUniMA import IliasDownloaderUniMA

m = IliasDownloaderUniMA()
m.setParam('download_path', '/Users/jonathan/Desktop/')
m.login('jhelgert', 'my_password')
m.addAllSemesterCourses(exclude_ids=[1020946])
m.downloadAllFiles()

Note that the backslash \ is a special character inside a python string. So on a windows machine it's necessary to use a raw string for the download_path:

m.setParam('download_path', r'C:\Users\jonathan\Desktop\')

Where do I get the course ref id?

Parameters

The Parameters can be set by the .setParam(param, value) method, where param is one of the following parameters:

  • 'num_scan_threads' number of threads used for scanning for files inside the folders (default: 5).
  • 'num_download_threads' number of threads used for downloading all files (default: 5).
  • 'download_path' the path all the files will be downloaded to (default: the current working directory).
  • 'tutor_mode' downloads all submissions for each task unit once the deadline has expired (default: False)
  • 'verbose' printing information while scanning the courses (default: False)
from IliasDownloaderUniMA import IliasDownloaderUniMA

m = IliasDownloaderUniMA()
m.setParam('download_path', '/Users/jonathan/Desktop/')
m.setParam('num_scan_threads', 20)
m.setParam('num_download_threads', 20)
m.setParam('tutor_mode', True)
m.login('jhelgert', 'my_password')
m.addAllSemesterCourses()
m.downloadAllFiles()

Advanced Usage

Since some lecturers don't use ILIAS, it's possible to use an external scraper function via the addExternalScraper(fun, *args) method. Here fun is the external scraper function and *args are the corresponding variable number of arguments. Note that's mandatory to use course_name as first function argument for your scraper. Your external scraper is expected to return a list of dicts with keys

# 'course': <the course name>
# 'type': 'file'
# 'name': <name of the parsed file>
# 'size': <file size (in mb) as float>
# 'mod-date': <the modification date as datetime object>
# 'url': <file url>
# 'path': <path where you want to download the file>

Here's an example:

from IliasDownloaderUniMA import IliasDownloaderUniMA
from urllib.parse import urljoin
from bs4 import BeautifulSoup
from dateparser import parse
import requests

def myExtScraper(course_name, url):
	"""
	Extracts all file links from the given url.
	"""
	files = []
	file_extensions = (".pdf", ".zip", ".tar.gz", ".cc", ".hh", ".cpp", ".h")
	soup = BeautifulSoup(requests.get(url).content, "lxml")
	for link in [i for i in soup.find_all(href=True) if i['href'].endswith(file_extensions)]: 
		file_url = urljoin(url, link['href'])
		resp = requests.head(file_url)
		files.append({
			'course': course_name,
			'type': 'file',
			'name': file_url.split("/")[-1],
			'size': 1e-6 * float(resp.headers['Content-Length']),
			'mod-date': parse(resp.headers['Last-Modified']),
			'url': file_url,
			'path': course_name + '/'
		})
	return files

m = IliasDownloaderUniMA()
m.login("jhelgert", "my_password")
m.addAllSemesterCourses()
m.addExternalScraper(myExtScraper, "OOP for SC", "https://conan.iwr.uni-heidelberg.de/teaching/oopfsc_ws2020/")
m.downloadAllFiles()

Contribute

Feel free to contribute in any form! Feature requests, Bug reports or PRs are more than welcome.