This project demonstrates the implementation of various machine learning models on the bill_authentication.csv
dataset. The tasks include data loading and cleaning, Decision Tree classification, K-Means clustering, and evaluation of classification and linear regression algorithms. The goal is to analyze financial data for predictive insights.
- Objective: Load and preprocess the dataset to ensure it is ready for analysis.
- Steps:
- Import necessary libraries (
pandas
). - Load the dataset using
pd.read_csv
. - Check for missing values using
data.isnull().sum()
. - Display basic statistical details with
data.describe()
.
- Import necessary libraries (
- Outcome: The dataset is clean with no missing values, and preliminary insights are gathered.
- Objective: Implement a Decision Tree classifier to categorize the data.
- Steps:
- Split the data into features (
X
) and target (y
). - Split the data into training and testing sets using
train_test_split
. - Train the Decision Tree model (
DecisionTreeClassifier
). - Evaluate the model using
classification_report
andaccuracy_score
.
- Split the data into features (
- Outcome: The model achieved an accuracy of 98.54%, demonstrating high performance.
- Objective: Apply K-Means clustering to identify patterns in the data.
- Steps:
- Determine the optimal number of clusters using the Elbow Method.
- Fit the K-Means model with the chosen number of clusters (
n_clusters=3
). - Assign cluster labels to the dataset.
- Outcome: The Elbow Method suggested 3 clusters, and the data was successfully segmented.
- Objective: Assess the performance of the Decision Tree model using metrics.
- Steps:
- Generate a confusion matrix.
- Calculate precision, recall, and F1-score.
- Outcome: High precision (1.0), recall (0.967), and F1-score (0.983) indicate robust performance.
- Objective: Implement and evaluate a Linear Regression model.
- Steps:
- Create a dummy target variable for regression.
- Split the data into training and testing sets.
- Train the Linear Regression model (
LinearRegression
). - Evaluate using Mean Squared Error (MSE) and R-squared score.
- Outcome: The model achieved an MSE of 0.189 and an R-squared score of 0.878, indicating a good fit.
- Decision Tree Classification: Accuracy of 98.54%.
- K-Means Clustering: Optimal clusters identified (3).
- Linear Regression: MSE of 0.189 and R-squared of 0.878.
This project highlights the effectiveness of machine learning models in analyzing financial data. The Decision Tree classifier performed exceptionally well, while K-Means clustering revealed meaningful patterns. The Linear Regression model also demonstrated strong predictive capabilities. These findings underscore the potential of machine learning in financial predictive analytics.
- Dataset: [bill_authentication.csv]