/Finance_ETL_Pipeline

An ETL pipeline for processing and analyzing credit card and loan application data, enabling insightful financial decision-making.

Primary LanguageJupyter NotebookMIT LicenseMIT

ETL Pipeline for Credit Card and Loan Application Analysis

Table of Contents

Overview

This project uses the following technologies to build and manage an ETL pipeline for a Credit Card dataset and a Loan Application dataset. In this project, PySpark features and functions are used to extract, transform, and load data into a MySQL Database. The stored data is subsequently utilized for interactive querying and generating visualizations. For more details, please check out the project requirement file.

  • Python (Requests, MySQL Connector, Tabulate)
  • MySQL Database
  • Apache Spark (PySpark Core, PySpark SQL, PySpark DataFrame)
  • Python Visualization and Analytics libraries (Matplotlib)

Workflow Diagram

workflow

Getting Started

  1. Clone the entire repository to your local machine.
  2. Set up and activate a virtual environment in the project's root directory:
    • On Windows:
      • 'python -m venv venv' followed by 'venv\Scripts\activate'
    • On Mac:
      • 'python -m venv venv' followed by 'source venv/bin/activate'
  3. Install the required libraries by running:
    • 'pip install -r requirements.txt'
  4. Execute the main script:
    • 'python main.py'

Some Output Preview

interactive query

Possoble Improvements

  • A front-end web application used to display data obtained from the database.
  • Use Tableau for better data analysis and presentation.

License

This project uses the following license: MIT License.