Project: Data Pipeline and Web Scraping
Author: Connor Finn Data: February 27, 2020This repo is an exercise I completed for my Data Science course which involves created a data pipeline, and scraping a webpage for some 'interesting' data. The webpage I chose to scrape is a blog style webpage which described some of the top restauraunts in NYC. The file is saved to csv as indicated by the created data pipeline class.
- web_scraper.ipynb
- ./data/restaurant_data.csv
- note that the file is saved in the data repositor
Simply run the jupyter notebook file. If a data repo does not exist, the script will create the folder and the csv file.
- This project uses the beautifulsoup package, along with requests and lxml.html to scrape and parse the webpage.
- Pandas is used to organize the data and eventually write to csv.