You were recently hired by an E-commerce company. Your mission is to provide insights on sales.
There are four datasets :
- Products: a list of available products.
- Items: a list of items.
- Orders: a list of customer orders on the website.
- Customers: a list of customers.
Centralize your projet and code in a Github repository and provide the url once the test is completed.
To Dos
- Get the four datasets into Spark
- Each day we want to compute summary statistics by customers every day (spending, orders etc.) Create a Spark script to compute for a given day these summary statistics.
- Run that script over the necessary period to inject historic data. Then, identify the top customers
- How many customers are repeaters ?
- Optionnal : If you want to show more skills you have, add anything you find usefull :
- To automate this and make it run every day
- To bring it in a "Infra-as-Code" way
- To add real-time on anything you want
- Anything you want to show