/obese-tree

Obese-tree is a GitHub repository showcasing the application of a Support Vector Machine (SVM) model to estimate obesity levels based on eating habits and physical condition. Explore the code, data, and Jupyter notebooks to learn how SVM can be used for predictive modeling in the context of health and wellness.

Primary LanguageJupyter Notebook

Obese-tree - Support Vector Machine (SVM) for Obesity Level Estimation

This README file provides an overview of the project that applies a Support Vector Machine (SVM) model to a dataset for estimating obesity levels based on eating habits and physical condition. The dataset contains various independent variables, including Gender, Age, Height, Weight, family_history_with_overweight, FAVC, FCVC, NCP, CAEC, Smoking, CH2O, SCC, FAF, TUE, CALC, Mode of Transport, and the dependent variable, Obesity Category.

Dataset Description

The dataset includes the following columns:

  • Gender: Gender of the individuals (Categorical: 'Female' or 'Male').
  • Age: Age of the individuals.
  • Height: Height of the individuals.
  • Weight: Weight of the individuals.
  • family_history_with_overweight: Family history of overweight (Categorical: 'yes' or 'no').
  • FAVC: Frequent consumption of high caloric food (Categorical: 'no' or 'yes').
  • FCVC: Frequency of consumption of vegetables.
  • NCP: Number of main meals.
  • CAEC: Consumption of food between meals (Categorical: 'Sometimes', 'Frequently', 'Always', or 'no').
  • Smoking: Smoking habits (Categorical: 'no' or 'yes').
  • CH2O: Daily water consumption.
  • SCC: Calories consumption monitoring (Categorical: 'no' or 'yes').
  • FAF: Physical activity frequency.
  • TUE: Time using technology devices.
  • CALC: Consumption of alcohol (Categorical: 'no', 'Sometimes', 'Frequently', or 'Always').
  • MTRANS: Mode of transportation (Categorical: 'Public_Transportation', 'Walking', 'Automobile', 'Motorbike', 'Bike').
  • Obesity Category: Dependent variable with categories: 'Normal_Weight', 'Overweight_Level_I', 'Overweight_Level_II', 'Obesity_Type_I', 'Insufficient_Weight', 'Obesity_Type_II', 'Obesity_Type_III'.

Data Preprocessing

  • The dataset was split into a training set and a test set.
  • Categorical data in columns such as 'Gender', 'family_history_with_overweight', 'FAVC', 'CAEC', 'Smoking', 'SCC', 'CALC', and 'MTRANS' were label encoded.

Feature Scaling

  • The dataset was standardized using StandardScaler to ensure that features had similar scales, which is important for SVM.

Model Training

  • An SVM model was trained with the linear kernel.
  • The random state was set to 0 for reproducibility.

Model Evaluation

  • A confusion matrix was generated to assess model performance.
[[56  0  0  0  0  0  0]
 [ 5 53  0  0  0  4  0]
 [ 0  0 75  2  0  0  1]
 [ 0  0  1 57  0  0  0]
 [ 0  0  0  0 63  0  0]
 [ 0  2  0  0  0 52  2]
 [ 0  0  0  0  0  2 48]]
  • Accuracy score: 0.9550827423167849

Classification Report

              precision    recall  f1-score   support

           0       0.92      1.00      0.96        56
           1       0.96      0.85      0.91        62
           2       0.99      0.96      0.97        78
           3       0.97      0.98      0.97        58
           4       1.00      1.00      1.00        63
           5       0.90      0.93      0.91        56
           6       0.94      0.96      0.95        50

    accuracy                           0.96       423
   macro avg       0.95      0.96      0.95       423
weighted avg       0.96      0.96      0.95       423

K-Fold Cross Validation

  • K-fold cross-validation was used, resulting in a mean accuracy of 94.20% and a standard deviation of 1.34%.

Grid Search for Hyperparameter Tuning

  • Grid Search was performed to find the best hyperparameters, yielding the following result:
    • Best Accuracy: 94.20%
    • Best Parameters: {'C': 1, 'kernel': 'linear'}

This project demonstrates the application of SVM for obesity level estimation, achieving a high level of accuracy and providing insights into the factors influencing obesity.