TECHNICAL TEST : ANALYZE E-COMMERCE DATA

You were recently hired by an E-commerce company. Your mission is to provide insights on sales.

There are four datasets :

  • Products: a list of available products.
  • Items: a list of items.
  • Orders: a list of customer orders on the website.
  • Customers: a list of customers.

Centralize your projet and code in a Github repository and provide the url once the test is completed.

To Dos

  1. Get the four datasets into Spark
  2. Each day we want to compute summary statistics by customers every day (spending, orders etc.) Create a Spark script to compute for a given day these summary statistics.
  3. Run that script over the necessary period to inject historic data. Then, identify the top customers
  4. How many customers are repeaters ?
  5. Optionnal : If you want to show more skills you have, add anything you find usefull :
    • To automate this and make it run every day
    • To bring it in a "Infra-as-Code" way
    • To add real-time on anything you want
    • Anything you want to show