classification_models_project: A Jupyter Notebook repository from bilalkaracik

This is a machine learning code written in Python using various modules such as numpy, pandas, seaborn, scikit-learn, statsmodels, matplotlib, xgboost, and lightgbm. The purpose of this code is to build and compare various classification models for breast cancer diagnosis based on patient's characteristics.

The code starts by importing the necessary modules and reading the dataset (breast-cancer.csv) into a Pandas dataframe. Then, it performs some exploratory data analysis (EDA) by checking the structure of the dataframe, the statistical summary of the numerical variables, and the distribution of the target variable (diagnosis).

Next, it splits the data into training and test sets, and starts building different classification models, including Logistic Regression, Gaussian Naive Bayes, K-Nearest Neighbors (KNN), Support Vector Machines (SVM), and Random Forest. For each model, it fits the training data and makes predictions on the test set. It then evaluates the performance of each model using accuracy, confusion matrix, and classification report.

The code also includes model tuning for KNN, SVM, and RBF SVM using GridSearchCV to find the best hyperparameters. Finally, it compares the performance of all the models using cross-validation and chooses the best model for breast cancer diagnosis.

bilalkaracik/classification_models_project