/scrapping-classcentral

Scrapping data about online courses from classcentral.com using Scrapy, a framework for extracting the data from websites.

Primary LanguagePython

Scrapping Online Courses in class central

This project scrape data about Engineering job offers in New York City from classcentral.com using Scrapy, a framework for extracting the data from websites.

The project has one spider able to some textual data (name, platform, language and URL) of all online courses in a subject of study.

The textual data is available in courses.csv file.

How to use

You will need Python 3.x to run the scripts. Python can be downloaded here.

You will need also Google Chrome to scrape the courses.

Install Scrapy framework:

Install Selenium:

Download and save Chromedriver, according with your Google Chrome version.

After downloading Chromedriver, you must set the Chromedriver and Google Chrome paths on your machine.

Once you have installed Scrapy, Selenium and Chromedriver, just clone/download this project, access the folder in command prompt/Terminal and run the following command:

scrapy crawl classcentral -o courses.csv

This command will scrape, by default, all Data Science courses in class central. To scrape other listed subject, run the following command:

scrapy crawl classcentral -a subject="Subject Name" -o courses.csv

So, to scrape the Health & Medicine courses, you need to run the following command:

scrapy crawl classcentral -a subject="Health & Medicine" -o courses.csv

You can change the output format to JSON or XML by change the output file extension (ex: courses.json).