YouTube Data Harvesting and Warehousing is a project that intends to provide users with the ability to access and analyse data from numerous YouTube channels. SQL, MongoDB, and Streamlit are used in the project to develop a user-friendly application that allows users to retrieve, save, and query YouTube channel and video data.
this project requires the following components:
Streamlit library was used to create a user-friendly UI that enables users to interact with the programme and carry out data retrieval and analysis operations.
Python is a powerful programming language renowned for being easy to learn and understand. Python is the primary language employed in this project for the development of the complete application, including data retrieval, processing, analysis, and visualisation.
The googleapiclient library in Python facilitates the communication with different Google APIs. Its primary purpose in this project is to interact with YouTube's Data API v3, allowing the retrieval of essential information like channel details, video specifics, and comments. By utilizing googleapiclient, developers can easily access and manipulate YouTube's extensive data resources through code.
MongoDB Atlas is a comprehensive cloud-based database service designed specifically for MongoDB. In this project, MongoDB Atlas is utilized to store the data obtained from YouTube's Data API v3. By leveraging MongoDB Atlas, developers can benefit from a fully managed and hassle-free database solution that ensures the reliable and scalable storage and retrieval of data, thereby facilitating efficient data management.
PostgreSQL is an open-source, advanced, and highly scalable database management system (DBMS) known for its reliability and extensive features. It provides a platform for storing and managing structured data, offering support for various data types and advanced SQL capabilities.
When engaging in the scraping of YouTube content, it is crucial to approach it ethically and responsibly. Respecting YouTube's terms and conditions, obtaining appropriate authorization, and adhering to data protection regulations are fundamental considerations. The collected data must be handled responsibly, ensuring privacy, confidentiality, and preventing any form of misuse or misrepresentation. Furthermore, it is important to take into account the potential impact on the platform and its community, striving for a fair and sustainable scraping process. By following these ethical guidelines, we can uphold integrity while extracting valuable insights from YouTube data.
1.googleapiclient.discovery
2.streamlit
3.psycopg2
4.pymongo
5.pandas
-
Retrieval of channel and video data from YouTube using the YouTube API.
-
Storage of data in a MongoDB database as a data lake.
-
Migration of data from the data lake to a SQL database for efficient querying and analysis.
-
Search and retrieval of data from the SQL database using different search options.
you can view a video of this work on my linkedIn: https://www.linkedin.com/posts/raja-soundari-640152281_data-project-content-activity-7082046816392728576-TXv-?utm_source=share&utm_medium=member_desktop