DataSpider

This project is intended to develop a tool which would enable a user to block certain specific contents from their feed. These content may include words/phrases, competitor ads, irritating pop-ups, etc. Alongside exploring Web Scraping & it's possibilities. Started as a webinar material for M.Sc Data analytics students.

Collecting data from websites using an automated process is known as web scraping.

Selenium Documentation: https://selenium-python.readthedocs.io/

Installation instructions for Selenium:

Installation instructions: 1) pip install selenium 2) Download chrome web driver from "https://sites.google.com/chromium.org/driver/downloads?authuser=0"

My Google Chrome version is: 'Version 99.0.4844.74 (Official Build) (arm64)'

If you are using MacBook, for the first time you would need to unquarantine the chrome driver. Open a terminal window at the location you have kept your chromedriver. The command you can use to do so is: xattr -d com.apple.quarantine chromedriver

For accessing APIs, use postman which can be downloaded from: https://www.postman.com/downloads/

For accessing Jupyter notebook instance online: https://jupyter.org/try

References:

avinashok/DataSpider

DataSpider