/DataSpider

This project is intended to develop a tool which would identify the movie/song based on an accurate description/humming harnessing various Web scraping and audio processing techniques.

Primary LanguageJupyter NotebookMIT LicenseMIT

DataSpider

This project is intended to develop a tool which would enable a user to block certain specific contents from their feed. These content may include words/phrases, competitor ads, irritating pop-ups, etc. Alongside exploring Web Scraping & it's possibilities. Started as a webinar material for M.Sc Data analytics students.

Collecting data from websites using an automated process is known as web scraping.

Selenium Documentation: https://selenium-python.readthedocs.io/

Installation instructions for Selenium:

Installation instructions: 1) pip install selenium 2) Download chrome web driver from "https://sites.google.com/chromium.org/driver/downloads?authuser=0"

My Google Chrome version is: 'Version 99.0.4844.74 (Official Build) (arm64)'

If you are using MacBook, for the first time you would need to unquarantine the chrome driver. Open a terminal window at the location you have kept your chromedriver. The command you can use to do so is: xattr -d com.apple.quarantine chromedriver

For accessing APIs, use postman which can be downloaded from: https://www.postman.com/downloads/

For accessing Jupyter notebook instance online: https://jupyter.org/try

References:

  1. https://realpython.com/python-web-scraping-practical-introduction/
  2. https://www.youtube.com/watch?v=Xjv1sY630Uc&list=PLzMcBGfZo4-n40rB1XaJ0ak1bemvlqumQ
  3. https://www.analyticsvidhya.com/blog/2021/12/text-classification-of-news-articles/
  4. https://www.kaggle.com/c/learn-ai-bbc/data
  5. https://github.com/DedSecInside/TorBot