A Python based customizable script for getting/scraping links to videos hosted on any website. Implemented using Scrapy and BeautifulSoup.
I created this script for my friend who didn't watch all the videos in the Maths section from gre.magoosh.com and was worried about his ending subscription. There were way too many steps to go through and download all the videos. So I built this script with just 2 hours of effort to scrape all the links to the videos (where they were directly hosted, in this case, Cloudfront), so that he can download all the videos in one go.
- A valid subscription is neccessary for downloading videos off the site.
- Scrapy and BeautifulSoup should be installed for the script to work. Links:
See Scrapy's documentation to learn how to execute spiders and crawlers.
- You can customize this Python code to different categories in the site. In the current code, only the videos in Mathematics section will be scraped from gre.magoosh.com.
- The algorithm and logic behind this script can be applied to any site to extract any form of data with precision.
- Only the stuff you need will be extracted and rest all will be ignored. This saves time and overall bandwidth used to successfully run the script.
Note to contributors: Please update the documentation whenever neccessary.