Python webscraping library for the University of Waikato.
Uses selenium and selenium-requests under the hood.
While on campus, the DUO two-factor authentication does not
prompt the user, which allows using the library in non-interactive mode (init_driver(False)
).
However, when off-campus, it is necessary to run it in interactive mode (init_driver(True)
),
in order to tick the Remember me for 30 days box and click on the Send me a push button to
accept the authentication on your mobile device.
The use of selenium was inspired by: https://stackoverflow.com/a/23929939/4698227
Create a virtual environment:
virtualenv -p /usr/bin/python3 venv
Install wai.scraper in the virtual environment:
./venv/bin/pip install git+https://github.com/fracpete/wai-scraper.git
The following example logs into the university website via SSO and outputs the HTML content of the staff landing page.
import getpass
import wai.scraper as ws
# initialize logger with debugging output
ws.init_logger(True)
# run Firefox in interactive mode (eg when off-campus, for interacting with 2FA)
driver = ws.init_driver(True)
# perform logins
user = input("Enter user: ")
pw = getpass.getpass("Enter password: ")
ws.sso(driver, user, pw, delay=15)
url = 'https://www.waikato.ac.nz/landing/staff.shtml'
# obtain staff landing page via selenium
ws.driver_get(driver, "staff landing page", url)
print("--> selenium")
print(driver.page_source)
# close the session
ws.close_driver(driver)