This project is about building a data pipeline for football data from fbref using Docker, PostgreSQL, Apache Airflow, and Azure Storage. It fetches, processes, and stores football data in a scalable and automated manner.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
What things you need to install the software and how to install them:
- Docker
- Azure CLI
- Python 3.7 or higher
A step by step series of examples that tell you how to get a development environment running:
- Clone the repository to your local machine:
git clone https://github.com/felipefe20/Azure-Dataeng-Football-Project.git
## Running the Code With Docker
1. Start your services on Docker with
```bash
docker compose up -d
-
Register database in PostgreSQL
http://localhost:5050
-
Create azure resources (ADLS) using Azure CLI
-
Trigger the DAG on the Airflow UI.
http://localhost:8080
-
Load data to PostgreSQL and ADLS
- Fetches data from Fbref.
- Cleans the data.
- Transforms the data.
- Loads data to PostgreSQL.
- Pushes the data to Azure Data Lake.