/YouTube-Content-Scraper

The provided Python code is a web application built using the Flask framework, designed to scrape data from YouTube based on a specified company name. This application aims to automate the process of collecting information about videos associated with a particular company on YouTube and present the data to users in a downloadable CSV file.

Primary LanguageJupyter Notebook

YouTube Content Scraper

The provided Python code is a web application built using the Flask framework, designed to scrape data from YouTube based on a specified company name. This application aims to automate the process of collecting information about videos associated with a particular company on YouTube and present the data to users in a downloadable CSV file. Let's break down the code and its functionalities into paragraphs.

1. Importing Libraries: The code begins by importing necessary Python libraries. Flask is employed as the web framework for building the application, while other libraries like Pandas, BeautifulSoup, Selenium, and others are utilized for web scraping and data manipulation.

2. Flask App Setup: The app object is created using Flask. Route decorators @app.route(...) are employed to define two routes: the root route ("/") for the main functionality and a "/download" route for accessing the generated CSV file. The main functionality is implemented in the index route, handling both GET and POST requests.

3. Web Scraping Function (scrape_youtube_data): The core functionality of the application is encapsulated in the scrape_youtube_data function. This function utilizes Selenium, a web automation tool, to navigate through web pages. It initiates a Google search to find the YouTube page associated with the given company name. The code then extracts the link to this YouTube page and navigates to the videos section. Through automated scrolling, the script loads additional videos dynamically. The data (video link, title, views, and upload time) is then extracted from the HTML and organized into a Pandas DataFrame. Finally, this DataFrame is saved as a CSV file named "data.csv."

4. Routes and HTML Templates: The Flask application has two main routes - "index" and "download." The "index" route renders an HTML template ("index.html") that includes a form for user input (company name). Upon form submission (POST request), the application calls the scrape_youtube_data function with the provided company name, then redirects the user to the "/download" route. The "/download" route allows users to download the generated CSV file.

5. Main Block: The script concludes with a standard block that ensures the Flask app is only executed when the script is run directly, not when imported as a module. The app.run(debug=True) command starts the Flask development server in debug mode.

Note: It's important to highlight that web scraping activities should be conducted responsibly and in accordance with the terms of service of the targeted websites. Additionally, the code assumes the presence and proper configuration of the Chrome web driver for Selenium. Local adjustments may be necessary based on the specific environment.