Objective: This project aims to develop a decision tree model to classify countries' continent ("landmass regions") based on flag characteristics. The primary goal is to predict whether a country belongs to either Europe or Oceania based on flag attributes.
Data Source: Flag dataset
Here's an overview of the tasks conducted using Python and various libraries:
-
Exploratory Data Analysis (EDA) and Decision Tree Creation: I'll begin by exploring the dataset and checking the distribution of landmass categories (Europe and Oceania). Then, I'll construct a decision tree model using Scikit-Learn for country classification.
-
Data Visualization: I'll create visualizations to understand the relationships between predictor variables and visualize the decision tree structure.
-
Data Preprocessing: Data will be preprocessed by selecting relevant predictor variables and encoding categorical features.
-
Data Splitting and Decision Tree Fitting: The dataset will be split into training and testing subsets. Subsequently, I'll fit a decision tree classifier and evaluate its accuracy.
-
Hyperparameter Tuning: Hyperparameters of the decision tree, specifically maximum depth and complexity cost parameter (ccp_alpha), will be optimized to enhance model performance.
-
Pruning the Decision Tree: Different values of ccp_alpha will be experimented with to prune the decision tree, improving its generalization ability.
-
Visualization of the Final Decision Tree: I'll provide a visualization of the final decision tree, showcasing the optimal hyperparameters.