Warning: Image scraping functionality is not yet operational. Blame the cookie monster.
Install pdfkit
system deps using this guide: wkhtmltopdf downloads.
- Install the required Python packages and browser binaries:
pip3 install -r requirements python3 -m playwright install
- Edit
run.py
to include your login credentials, the target eTextbook, and the number of pages you wish to scrape:asyncio.run(scrape_textbook_runner(username='', password='', textbook_name='eTextbook: Introduction to Algorithms and Data Structures', num_pages=10))
-
To run the script:
python3 run.py
-
To see the bot in action, set the
headless
arg toFalse
in the following line of code:browser = await p.chromium.launch(headless=True)