/youtubeScrappingWebsite-Public

Developed project with intention of understanding of web scrapping, api integration, talking to 3rd party libraries

Primary LanguagePython

youtubeScrappingWebsite-Public

Developed project with intention of understanding of web scrapping, flask-api integration, talking to 3rd google api.

This project provides an overview of how to implement Webscrapping using python.

Introduction of Webscrapping: Web scrapping is extracting data from a website and creating our own analysis on it.

  • app.py is the core api layer which accepts request and return a response/download/upload a file
  • Operations:
    • "/" : Redirects to home screen
    • "/channel" : Redirects to the searched channel
    • "/channel/videos" : Redirects to the channel videos
    • "/channel/video/comments" : Redirects to the channel, video comments
    • "/channel/video/download" : Downloads the channel video
    • "/channel/video/s3upload" : Uploads to the S3 bucket
  • AppConfig.py is the application related configuration information
  • DbModel.py, MongoDbModel.py, SnowflakeDbModel.py are the database related connection setting and CRUD operations
  • YTChannels.py is the core file which handle to extraction process from YouTube, upload to s3, save to database.
  • YTExceptions.py, YTLogger.py are exception handling and logging files
  • conf.ini application configuration information
  • generate_secrets.py is the secrets generation file
  • requirements.txt is the application package related information file

Libraries used

  • requests==2.27.1
  • beautifulsoup4==4.11.1
  • requests
  • mysql-connector-python
  • flask
  • requests_html
  • pytube
  • pybase64
  • boto3
  • cryptography
  • pymongo
  • pymongo[srv]
  • snowflake-connector-python
  • gunicorn==20.0.4