Data Analysis of Shopping Behavior on Double Eleven

Using taobao dataset from double eleven to analyse and predic user behavior

Introduction

  • Get basic overview of this festival,such as the toal volume of the transaction, the proportion of the buyer from different age, gender, and trend compared to last year.

  • Analyse the users' behaviors and tell the relationship between these behaviors and the final BUY hehavior. In other word, what kinds of behavior will bring to the buy behavior.

  • We predic whether the buyer will buy stuff from taobao or not.

  • In the end, all of the outcome from above will be visualized.

Step

image

  1. Get the dataset, preprocess it and load it into HDFS
  2. Use Hive to further process the dataset
  3. Using Spark to predic returned customer.
  4. Visulization, plan to use JavaWeb.

Environment

  • Linux
  • Hadoop
  • MySQL
  • Sqoop
  • Hive
  • Spark
  • Java 1.8
  • Python3