/OnlineShoppersIntention-EDA-Classification-Clustering

2020-2021 Fall Semester Data Mining(CENG3521) Class Final Project

Primary LanguageJupyter Notebook

Build Status Contributors

OnlineShoppersIntention-EDA-Classification-Clustering

This repository includes our 2020-2021 Fall Semester Data Mining(CENG3521) Class Final Project. We've used Python language to code and Jupyter Notebook for IDE. And Numpy, Pandas, Matplotlib, Seaborn and Sklearn are libraries which are used for it.

Team Members(sorted by name)


Ahmet GÜRBÜZ


P. Component Analysis(PCA)
Clustering(K-Means)

Murat GUN


Classification(SGDClassifier)
Classification(MLPClassifier)

Onur DUMAN


Explotary Data Analysis(EDA)
Classification(k-NN)

Requirements Modules and Its Version

  • Python 3.7.4

  • Numpy 1.16.5

  • Pandas 0.25.1

  • Matplotlib 3.1.1

  • Seaborn 0.9.0

  • Sklearn 0.21.3

Description of The Online Shoppers Intention Dataset

1

Dataset Information:

The dataset consists of feature vectors belonging to 12,330 sessions. The dataset was formed so that each session would belong to a different user in a 1-year period to avoidany tendency to a specific campaign, special day, user profile, or period.

Description of The Features

Administrative: This is the number of pages of this type (administrative) that the user visited.
Administrative_Duration: This is the amount of time spent in this category of pages.
Informational: This is the number of pages of this type (informational) that the user visited.
Informational_Duration: This is the amount of time spent in this category of pages.
ProductRelated: This is the number of pages of this type (product related) that the user visited.
ProductRelated_Duration: This is the amount of time spent in this category of pages.
BounceRates: The percentage of visitors who enter the website through that page and exit without triggering any additional tasks.
ExitRates: The percentage of pageviews on the website that end at that specific page.
PageValues: The average value of the page averaged over the value of the target page and/or the completion of an eCommerce
SpecialDay: This value represents the closeness of the browsing date to special days or holidays (eg Mother's Day or Valentine's day) in
Month: Contains the month the pageview occurred, in string form.
OperatingSystems: An integer value representing the operating system that the user was on when viewing the page.
Browser: An integer value representing the browser that the user was using to view the page.
Region: An integer value representing which region the user is located in.
TrafficType: An integer value representing what type of traffic the user is categorized into.
VisitorType: A string representing whether a visitor is New Visitor, Returning Visitor, or Other.
Weekend: A boolean representing whether the session is on a weekend.
Revenue: A boolean representing whether or not the user completed the purchase.

Dataset Origin:

https://archive.ics.uci.edu/ml/datasets/Online+Shoppers+Purchasing+Intention+Dataset

Source:

C. Okan Sakar Department of Computer Engineering, Faculty of Engineering and Natural Sciences, Bahcesehir University, 34349 Besiktas, Istanbul, Turkey

Yomi Kastro Inveon Information Technologies Consultancy and Trade, 34335 Istanbul, Turkey

Relevant Papers

Sakar, C.O., Polat, S.O., Katircioglu, M. et al. Neural Comput & Applic (2018)

Jupyter Notebook Content

Accuracy Scores

Accuray Scores

Files Structure

Folder

  • Report - Inculudes project report.

Jupyter Notebook

  • OnlineShoppersIntention.ipynb - Notebook used to clean, classify and cluster the dataset.

Txt

  • requirements.txt - Includes the modules which we used, and its version.

Dataset

  • online_shoppers_intention.csv - Dataset which we used.

  • online_shoppers_intention_cleaned.csv - Cleaned data set refer that last status, before classification