This project uses the following technologies to build and manage an ETL pipeline for a Credit Card dataset and a Loan Application dataset. In this project, PySpark features and functions are used to extract, transform, and load data into a MySQL Database. The stored data is subsequently utilized for interactive querying and generating visualizations. For more details, please check out the project requirement file.
- Python (Requests, MySQL Connector, Tabulate)
- MySQL Database
- Apache Spark (PySpark Core, PySpark SQL, PySpark DataFrame)
- Python Visualization and Analytics libraries (Matplotlib)
- Clone the entire repository to your local machine.
- Set up and activate a virtual environment in the project's root directory:
- On Windows:
- 'python -m venv venv' followed by 'venv\Scripts\activate'
- On Mac:
- 'python -m venv venv' followed by 'source venv/bin/activate'
- On Windows:
- Install the required libraries by running:
- 'pip install -r requirements.txt'
- Execute the main script:
- 'python main.py'
- A front-end web application used to display data obtained from the database.
- Use Tableau for better data analysis and presentation.
This project uses the following license: MIT License.