A simple spider to make it easy to get the course materials organized.
It downloads the course cases, pdf's and other materials hosted on S3. Youtube videos are not contemplated.
- clone this repo
- download the page html from
https://latam.ds4a.io/lesson-plan
and save it in the root folder of the repo - run the command
pip install -r requirements.txt
to install dependencies - run the command
scrapy runspider localSpider.py
to download the course content
- run
python unpack.py
and zipfiles will be extrated to./extracted
folder
- authenticate with user credentials
- add extraction capabilities