Welcome to the Spaceship Titanic Kaggle competition showcase! This repository encapsulates my journey through the competition, culminating in a top 5% performance with a precision score of 0.8057 for binary classification. Initially undertaken as a learning project, it now stands as a valuable addition to my portfolio, showcasing the skills and techniques I've acquired in the realm of machine learning.
Here lies the initial simple model, serving as a baseline for subsequent comparisons.
The first iteration of the pipeline, presenting a straightforward approach to data processing and modeling.
A visual treat! This file offers informative graphs and visualizations, shedding light on the feature engineering process.
The star of the show! This file hosts the main pipeline, featuring the final prediction model along with an extensive grid search.
Witness the transformation! Engage with features like 'Travelling_Solo', 'GroupSize', 'Cabin_Deck', 'Cabin_Number', and more, crafted to enhance the predictive power of the model.
A bespoke transformer class, meticulously designed for cleaning and imputing various features.
A custom feature union class orchestrating the collaboration of different transformers.
A nimble transformer specializing in the numerical feature transformation domain.
- Data Cleaning and Imputation: The
GeneralCleaner
class takes charge of cleaning and imputing, ensuring a pristine dataset. - Feature Selection: The
FeatureSelector
class elegantly separates numeric and categorical features. - Imputation: Enter the
CustomImputer
class, stepping in to fill in missing values with precision. - One-Hot Encoding: The
CustomDummify
class executes a tailored one-hot encoding strategy, with an option to drop the first column. - Scaling: The
CustomScaler
class handles the numerical features, offering a choice between standard and robust scaling.
Embark on a journey through various models, including Random Forest, Gradient Boosting, AdaBoost, SVM, and more. The final contenders for optimization are LightGBM, Random Forest, and XGBoost.
The thrilling grid search unfolds! Tune hyperparameters for selected models, adjusting settings like the number of estimators, learning rate, and maximum depth for optimal performance.
These functions encapsulate the essence of model fitting and the seamless creation of prediction CSV files.
Peek behind the curtain! The preview_df
function unveils a transformed DataFrame, providing a snapshot of the preprocessing magic.
Embark on your own exploration:
- Review the Python files, particularly
feature_union_pipe.py
. - Tailor configurations and hyperparameters to your requirements.
- Execute the code to immerse yourself in the training and evaluation of models.
- Delve into additional visualizations and analyses in
explanatory.py
. - Utilize the provided functions to fit the final model and save predictions.
This project owes its existence to the Spaceship Titanic Kaggle competition. A heartfelt thank you to Kaggle for the dataset and the community for fostering valuable insights and discussions. Feel free to contribute and enhance this project—it's an open canvas for collaborative improvement.