Predict the intention (make a purchase or not) of e-commerce website visitors.
http://archive.ics.uci.edu/ml/datasets/Online+Shoppers+Purchasing+Intention+Dataset
The dataset consists of feature vectors belonging to 12,330 sessions. The dataset was formed so that each session would belong to a different user in a 1-year period to avoid any tendency to a specific campaign, special day, user profile, or period. Of the 12,330 sessions in the dataset, 84.5% (10,422) were negative class samples that did not end with shopping, and the rest (1908) were positive class samples ending with shopping.
Feature name | Feature description | Min. val | Max. val | SD |
---|---|---|---|---|
Admin. | #pages visited by the visitor about account management | 0 | 27 | 3.32 |
Ad. duration | #seconds spent by the visitor on account management related pages | 0 | 3398 | 176.70 |
Info. | #informational pages visited by the visitor | 0 | 24 | 1.26 |
Info. durat. | #seconds spent by the visitor on informational pages | 0 | 2549 | 140.64 |
Prod. | #pages visited by visitor about product related pages | 0 | 705 | 44.45 |
Prod.durat. | #seconds spent by the visitor on product related pages | 0 | 63,973 | 1912.3 |
Bounce rate | Average bounce rate value of the pages visited by the visitor | 0 | 0.2 | 0.04 |
Exit rate | Average exit rate value of the pages visited by the visitor | 0 | 0.2 | 0.05 |
Page value | Average page value of the pages visited by the visitor | 0 | 361 | 18.55 |
Special day | Closeness of the site visiting time to a special day | 0 | 1.0 | 0.19 |
Feature name | Feature description | Number of Values |
---|---|---|
OperatingSystems | Operating system of the visitor | 8 |
Browser | Browser of the visitor | 13 |
Region | Geographic region from which the session has been started by the visitor | 9 |
TrafficType | Traffic source (e.g., banner, SMS, direct) | 20 |
VisitorType | Visitor type as “New Visitor,” “Returning Visitor,” and “Other” | 3 |
Weekend | Boolean value indicating whether the date of the visit is weekend | 2 |
Month | Month value of the visit date | 12 |
Revenue | Class label: whether the visit has been finalized with a transaction | 2 |
The main goal of this project is to design a machine learning classification system, that is able to predict an online shopper's intention ( buy or no buy ), based on the values of the given features (from google analytics). A number of different classification algorithms is tested, in order to pick the best one for the project.
In this project, we used Online Shoppers Intention dataset to build models that can classify website visitor, and predict which of them is likely going to make a purchase on the website. 7 different learning classifiers (Naive Bayes, KNN, SVM, Logistic Regression, Random Forest, Gradient Boosting, and Adaboosting) were tested and optimized, and we have achieved the best classification performance using Gradient Boost classifier, followed by random Forest, and then Adaboost.
Classifier | Accuracy | F1-Score | Precision | Recall |
---|---|---|---|---|
Naive Bayes | 0.775 | 0.491 | 0.394 | 0.652 |
KNN | 0.873 | 0.506 | 0.723 | 0.39 |
SVM | 0.889 | 0.6 | 0.751 | 0.5 |
Logistic Regression | 0.879 | 0.529 | 0.758 | 0.406 |
Random Forest | 0.902 | 0.662 | 0.774 | 0.578 |
Gradient Boost | 0.905 | 0.689 | 0.761 | 0.63 |
AdaBoost | 0.889 | 0.624 | 0.713 | 0.555 |