By Kristine Petrosyan
As a data scientist, I have been tasked with drawing insights from a Kaggle dataset Brazillian retailer Olist. In particular, I will seek to answer the following questions, which are of interest to stakeholders:
- Customer LTV(lifetime value)
- Monthly performance of the business
- Best selling categories
- Prediction for future sales
The Jupyter Notebook is our key deliverable and contains the answers to the above questions.
The data was provided from Kaggle https://www.kaggle.com/olistbr/brazilian-ecommerce and https://www.kaggle.com/olistbr/marketing-funnel-olist.
- The relevant data was queried from the table and stored as a Pandas DataFrame.
- Data manipulation was undertaken as required (e.g. creating feature columns).
- EDA and visualisations were created.
- Time Series Arima model were used to forecast the future sales.
-
As a conclusion we have:
- From all customers only 3% are recurring and remaining 97% are just below 1 year purchasers.
- Total revenues across 29 segments came in at 664,858 in the first eight months of 2018. The biggest segment was 'watches', which generated 17.4% of total revenues.
- The best categories are watches and audio.
- Though 'watches' segment is the largest part of revenue, it has only two sellers. Furthermore, the leading seller generated 97.0% of segment revenue.
Please feel fee to contact me kristinelpetrosyan@gmail.com.