ETL Pipeline for Credit Card and Loan Application Analysis

Overview

This project uses the following technologies to build and manage an ETL pipeline for a Credit Card dataset and a Loan Application dataset. In this project, PySpark features and functions are used to extract, transform, and load data into a MySQL Database. The stored data is subsequently utilized for interactive querying and generating visualizations. For more details, please check out the project requirement file.

Clone the entire repository to your local machine.
Set up and activate a virtual environment in the project's root directory:
- On Windows:
  - 'python -m venv venv' followed by 'venv\Scripts\activate'
- On Mac:
  - 'python -m venv venv' followed by 'source venv/bin/activate'
Install the required libraries by running:
- 'pip install -r requirements.txt'
Execute the main script:
- 'python main.py'

This project uses the following license: MIT License.