/CTR-prediction-with-catboost

Click-Through Rate Prediction

Primary LanguageJupyter NotebookGNU Affero General Public License v3.0AGPL-3.0

Click-Through Rate Prediction with CatBoost

This study focuses on improving Click-Through Rate (CTR) prediction in online advertising and content recommendation systems. Accurate CTR prediction is vital for optimizing marketing efforts and enhancing user experiences. The Kaggle competition dataset is used for this purpose and the prediction model is developed by using the Catboost Algorithm.

CatBoost is a supervised machine learning technique employed within the Train Using AutoML tool. It relies on decision trees for tasks involving classification and regression. The name CatBoost signifies its primary attributes: the ability to handle categorical data (represented by "Cat") and its utilization of gradient boosting (signified by "Boost").

CatBoost is a good choice, but the best model depends on your specific dataset and experiments. It's always a good idea to compare CatBoost's performance with other models like LightGBM, XGBoost, and Factorization Machines to find the optimal model for your task.

One of the main purposes of this study is to use the Weight&Bias platform and see how we can read the model development results from W&B platform in an efficient way.

Dataset

The dataset used in this study is obtained from the Kaggle competition in the link. It includes 11 days of click data. The dataset includes 3 different files; train, test, and submission. In this study, only random 10000 data points in the training dataset will be used for quick analysis and model development. Subsampling will be done without considering class balance.

Environment

To install the dependencies to run the notebook, you can use Anaconda. Once you have installed Anaconda, run:

$ conda env create -f environment.yml

Notebooks

CTR-EDA-feature-engineering.ipynb notebook includes feature engineering steps for the dataset. model_development.ipynb includes all model development and hyperparameter tuning processes by using the Weight&Bias platform.

Proposed Resources

Through this study, the following websites and notebooks are used. You can also get help from these resources.

Contribution

If you want to contribute please, send your pull request. All contributions are welcome!

Please check that repository for updates, for opening issues or sending pull requests.