The primary objective of this project was to gather and integrate data from various sources related to the Premier League, in order to perform detailed analysis, predictions, and visualizations. An additional goal was to develop a dynamic and interactive dashboard, with real-time updates as soon as the database is modified
Architecture
Data is extracted from several football websites and stored in two locations, this repository and a cloud postgres database and this databases are updated every weekend.
Details
The ./Updating_with_BS4/extract_transform.py script web-scrapes data from football websites that holds Premier league data. The links below are site pages webscraped with beautiful soup and selenium, transformsed with PySpark and pandas and stored the data in the csv_dir folder
Through my expertise in data integration, visualization, and dashboard development, I was able to successfully execute this project, leveraging the power of data-driven insights to make informed decisions related to the Premier League. The resulting dashboard provided a comprehensive and intuitive view of key performance indicators, enabling us to quickly identify trends, patterns, and opportunities for improvement.