/visitante

Set of Hadoop, Spark and Storm based tools for web and customer analytic

Primary LanguageJava

Introduction

The original goal of visitante was to calculate various web analytic metric as defined by Avinash Kaushik (http://www.kaushik.net/avinash/) on the Hadoop, Spark and Storm platform. However, it has evolved into a general purpose log analytic and mining solution, beyond web server logs.

It also includes customer or marketing analytic solution. Since customer behavior data is mostly captured in logs, there is a close relationship between customer analytics and log analytics. Recently search analytics solutions have also been added

Philosophy

  • Simple and easy to use batch and real time web analytic
  • Highly configurable

Blogs

The following blogs of mine are good source of details of visitante

Solutions

  • Hadoop based batch analytic for

    • Num of pages visited
    • Total time spent
    • Last page visited
    • Flow status (e.g., whether checkout flow was entered, entered but not completed or completed)
    • Incident detection
    • Pattern based event detection with context
    • Customer life time value
  • Storm based real time analytic for

    • Bounce rate
    • Visit depth distribution

Build

For Hadoop 1

  • mvn clean install

For Hadoop 2 (non yarn)

  • git checkout nuovo
  • mvn clean install

For Hadoop 2 (yarn)

  • git checkout nuovo
  • mvn clean install -P yarn

For spark

  • Build chombo first in master branch with
    • mvn clean install
    • sbt publishLocal
  • Build chombo-spark in chombo/spark directory
    • sbt clean package

Need help?

Please feel free to email me at pkghosh99@gmail.com

Contribution

Contributors are welcome. Please email me at pkghosh99@gmail.com