This is a Python Flask web application that extracts data from the first 5 videos on a YouTube search page.
The application extracts the following data for each of the 5 videos:
- Link to the video
- Thumbnail image URL
- Title of the video
- Number of views
- Time of posting
The extracted data is then saved in a CSV file named youtube_scrap.csv.
The project has been deployed on AWS successfully with all the above mentioned functionalities.
Project Github Repository Link - Github repo link
AWS Deployment Link - Click here for live site (Link Down)
Screenshot 1
Screenshot 2
This application requires the following Python libraries:
- Flask
- Flask-Cors
- BeautifulSoup
- Selenium
- ChromeDriver Manager
- Python 3.7
- Github v1
- AWS - Elastic Beanstalk
- AWS - CodePipeline
- Clone this repository.
- Install the dependencies.
$ pip install -r requirements.txt
- Download the Chrome driver from here and add it to your system path.
- Run the application:
$ python application.py
- Open your web browser and go to
http://127.0.0.1:8000
to see the application running.
-
Input a search query in the text box and click the "Scrape" button.
-
The application will scrape YouTube for the top 5 videos related to the search query, and display their links, titles, thumbnails, views and posting times.
-
The scraped data will also be saved in a CSV file named youtube_scrap.csv in the same directory as the application.py file.
Aditya Azad - Initial work
This program was created as a learning exercise, and contributions are not currently being accepted. However, you are welcome to use and modify the code for your own purposes.
This project was a part of the Data Science course provided by PW Skills.