/ScrapingProject

edX Scraping Project

Primary LanguageJupyter Notebook

edX Web Scraping Project

Patrick Masi-Phelps

NYC Data Science Academy

The purpose of this excercise was to successfully scrape edX's website for information on online courses currently offered, and then conduct exploratory data analysis on the scraped data. This information could be useful for educational institutions - getting a clear picture of the current supply and characteristics of MOOCs could better inform current and potential market participants. This can also be useful for students looking understand the availability of alternative online options.

The "edXScraper" python notebook contains the code used to scrape edX.

The "edXcourses_working" python notebook shows the code used to clean and manipulate the scraped data, and then perform some basic visualizations and data analysis.

The .pkl file contains the master dataframe used for all visualizations and analysis. This file has already undergone the cleaning and manipulating process outlined in "edXcourses_working".

The .csv file contains the initial scraped data from edX, with some minor tweaks.

The pdf contains a presentation outlining the process and findings.