Tax Fraud Detection System

This repository contains a system for detecting potential tax fraud in financial data.

Project Structure

data_pipeline.ipynb: Jupyter notebook containing the data generation, exploration, feature engineering, model training, and evaluation pipeline.
app.py: Streamlit application for deploying the trained model as a web app for real-time fraud prediction. Note: The trained_model.pkl file generated by the Jupyter notebook is not included in the repository due to its potential size.

Data Generation: Simulates a dataset of financial transactions with features like income, expenses, tax liability, and fraud indicators.
Data Exploration: Analyzes the generated data to understand the relationships between features and potential fraud.
Feature Engineering: Creates a "Fraud" feature using anomaly detection techniques.
Model Training: Trains a Random Forest classification model to predict tax fraud based on financial data.
Model Evaluation: Evaluates the performance of the trained model using metrics like accuracy, precision, recall, F1 score, and ROC-AUC score.
Model Saving: Saves the trained model as trained_model.pkl for later use in the web app.

Upload a CSV file containing financial transaction data.
View the uploaded data.
Make real-time predictions on whether each transaction is likely fraudulent using the trained model.
Download the predicted data with a new "Predicted Fraud" column.
Explore basic Exploratory Data Analysis (EDA) visualizations of the uploaded data, including:
Value counts of predicted fraud
Correlation heatmap Note: This web app requires the trained_model.pkl file to be present in the same directory for loading the trained model.

Clone this repository.
Install the required libraries using pip install -r requirements.txt (assuming you have a requirements.txt file listing the dependencies).
For data pipeline:
- Open data_pipeline.ipynb in Jupyter Notebook and run all the cells to generate data, train the model, and save it.
For web app:
- Run streamlit run app.py from the command line in the project directory.
- This will launch the Streamlit app in your web browser, allowing you to upload a CSV file and view predictions.

Go to the application link : ( https://taxpayer-fraud-detection.streamlit.app/ )
From the github download and upload the file ( Test_Dataset.csv ) - Just for testing
Enjoy with the output ✨🍿

You can replace the simulated data generation in the Jupyter notebook with your actual financial data for training the model.
The web app provides a basic set of EDA visualizations. You can customize it further to include additional visualizations based on your needs.