In this project, power users were identified through a detailed process involving feature engineering, model selection, and evaluation. The methodology used for identifying power users is outlined below:
- Data Augmentation: Additional features were created, such as the average number of products in a basket, to enrich the dataset and improve model performance.
- Target Column: Initially, power users were identified based on a z-score method. However, this method resulted in very few power users, making it insufficient. Therefore, based on the distributions, users who spent more than $110 were identified as power users, forming the basis for the classification target.
- Training Dataset Preparation: The training dataset was adjusted using various oversampling methods to handle class imbalances.
- Oversampling Techniques: Methods like RandomOverSample, SMOTE, and ADASYN were employed to balance the dataset, ensuring that the models could effectively learn from both the majority and minority classes.
- Hyperparameter Optimization: GridSearchCV was used for hyperparameter tuning to find the best model configurations.
- Comparison of Models: Several models were compared based on their performance metrics, including KNN, XGBoost, Logistic Regression, and Random Forest. The XGBoost model with ADASYN oversampling showed the best performance.
Model | Recall | Precision | Log Loss |
---|---|---|---|
KNN | 51.65% | 88.70% | 6.88% |
KNN_RandomOverSample | 47.25% | 23.50% | 15.45% |
KNN_SMOTE | 62.64% | 19.40% | 19.13% |
KNN_ADASYN | 28.57% | 35.60% | 24.31% |
XGBoost | 71.43% | 100.00% | 5.23% |
XGBoost_RandomOverSample | 71.43% | 55.60% | 2.70% |
XGBoost_SMOTE | 71.43% | 97.00% | 1.41% |
XGBoost_ADASYN | 71.43% | 100.00% | 1.33% |
Logistic Regression | 67.03% | 89.70% | 1.21% |
Logistic Regression_RandomOverSample | 78.02% | 10.50% | 14.46% |
Logistic Regression_SMOTE | 76.92% | 11.60% | 12.64% |
Logistic Regression_ADASYN | 84.62% | 6.40% | 24.08% |
RandomForest | 70.33% | 100.00% | 4.49% |
RandomForest_RandomOverSample | 63.74% | 87.90% | 5.12% |
RandomForest_SMOTE | 72.53% | 42.00% | 3.16% |
RandomForest_ADASYN | 74.73% | 32.70% | 4.43% |
- Threshold Tuning: The threshold value of the model was adjusted to optimize performance. Despite testing various thresholds, the default value of 0.5 was retained as it provided the best balance between recall and precision.
- ROC AUC Curve Analysis: The performance of the XGBoost ADASYN model was further validated using the ROC AUC curve, demonstrating strong predictive capabilities.
1.Build the Docker image using the Dockerfile provided. This command creates an image tagged as xgboost_adasyn_poweruser_image:v1.
```sh
docker build -t xgboost_adasyn_poweruser_image:v1 .
```
2.To verify that the image was built correctly, run it locally. This command runs the container in detached mode, mapping port 8000 of the container to port 8000 on the host.
```sh
docker run -d -p 8000:8000 --name xgboost_container xgboost_adasyn_poweruser_image:v1
```
3.Before pushing the image to Docker Hub, tag it appropriately with your Docker Hub username and repository name.
```sh
docker tag xgboost_adasyn_poweruser_image:v1 yaseminbellioglu/xgboost_adasyn_poweruser_image:v1
```
4.Finally, push the tagged image to Docker Hub. This step uploads the image to your Docker Hub repository, making it available for deployment on other machines.
```sh
docker push yaseminbellioglu/xgboost_adasyn_poweruser_image:v1
```
Google Kubernetes Engine (GKE) provides a managed environment for deploying, managing, and scaling your containerized applications using Google infrastructure.
- Navigate to the Google Kubernetes Engine in the Google Cloud Console.
- Click on "Enable" to enable the GKE API for your project.
First, an Artifact Registry repository was created to store the Docker image. Follow these steps to push your Docker image to Artifact Registry:
1-Authenticate with Google Cloud: Ensure you are logged in to your Google Cloud account.
--gcloud auth login
2-Set Your Project: Configure your project settings.
--gcloud config set project PROJECT_ID
3-Configure Docker to use the gcloud command-line tool to authenticate requests to Artifact Registry.
--gcloud auth configure-docker us-central1-docker.pkg.dev
4-Build and Tag Your Docker Image: Build the Docker image and tag it with the appropriate name.
--docker build -t xgboost_adasyn_poweruser_image:v1 .
--docker tag xgboost_adasyn_poweruser_image:v1 us-central1-docker.pkg.dev/psychic-root-424207-s9/fast-api/xgboost_adayn_poweruser_image:cloudingv1
- In the Google Cloud Console, go to the Kubernetes Engine section.
- Click on "Create" and select "Autopilot" mode.
- Configure the cluster settings such as the name, location, and other options as needed.
- Click on "Create" to provision the Autopilot cluster. Google will manage the nodes and scaling for you.
-
Upload the Docker Image to Artifact Registry:
- Create an Artifact Registry repository if you don't already have one.
- Push your Docker image to Artifact Registry using the following commands:
gcloud auth login gcloud config set project PROJECT_ID gcloud auth configure-docker us-central1-docker.pkg.dev docker tag xgboost_adasyn_poweruser_image:v1 us-central1-docker.pkg.dev/PROJECT_ID/REPOSITORY/xgboost_adasyn_poweruser_image:cloudingv1 docker push us-central1-docker.pkg.dev/PROJECT_ID/REPOSITORY/xgboost_adasyn_poweruser_image:cloudingv1
-
Deploy to GKE:
- Navigate to the "Workloads" section in the Kubernetes Engine.
- Click on "Deploy" and select "Container image".
- Enter the container image URL from Artifact Registry (e.g.,
us-central1-docker.pkg.dev/PROJECT_ID/REPOSITORY/xgboost_adasyn_poweruser_image:cloudingv1
). - Configure the deployment settings such as the number of replicas, resources, and other options as needed.
- Click on "Deploy" to create the deployment in your Autopilot cluster.
- Navigate to the "Services & Ingress" section in the Kubernetes Engine.
- Click on "Expose" for your deployment.
- Configure the service settings, such as the port and type (e.g., External load balancer for public access).
- Click on "Expose" to create the service. The service will provide an endpoint for accessing your application.
- In the "Services & Ingress" section, find the external IP address for your service.
- Open a browser and navigate to the external IP address followed by the appropriate path (e.g.,
http://EXTERNAL_IP/docs
) to access the FastAPI interface.
This setup allows you to deploy and manage your containerized application on Google Kubernetes Engine, leveraging the benefits of automatic scaling and managed infrastructure provided by Autopilot.