/Azure-Dataeng-Football-Project

Data engineering project to analyze Football data

Primary LanguageJupyter Notebook

Football Data Engineering-ETL

This project is about building a data pipeline for football data from fbref using Docker, PostgreSQL, Apache Airflow, and Azure Storage. It fetches, processes, and stores football data in a scalable and automated manner.

Project Diagram

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

What things you need to install the software and how to install them:

  • Docker
  • Azure CLI
  • Python 3.7 or higher

Installing

A step by step series of examples that tell you how to get a development environment running:

  1. Clone the repository to your local machine:
git clone https://github.com/felipefe20/Azure-Dataeng-Football-Project.git
   
## Running the Code With Docker

1. Start your services on Docker with
   ```bash
   docker compose up -d
  1. Register database in PostgreSQL

    http://localhost:5050
  2. Create azure resources (ADLS) using Azure CLI

  3. Trigger the DAG on the Airflow UI.

    http://localhost:8080
  4. Load data to PostgreSQL and ADLS

Pipeline

  1. Fetches data from Fbref.
  2. Cleans the data.
  3. Transforms the data.
  4. Loads data to PostgreSQL.
  5. Pushes the data to Azure Data Lake.