/P4-Superstore-Sales

Exploratory, Cluster, and Time Series Analyses

Primary LanguageJupyter Notebook

P4-Superstore-Sales

Context

From the Kaggle Data Card section:

"With growing demands and cut-throat competitions in the market, a Superstore Giant is seeking your knowledge in understanding what works best for them. They would like to understand which products, regions, categories and customer segments they should target or avoid."

Workflow

Prior to any aggregation or analysis, the Excel Workbook "superstore.xlsx" has been converted to a MySQL database (superstore). Queries are made in filed scripts, which may alter existing tables or create new ones, but aggregated data may be temporarily held in Pandas DataFrames to make visualization and further analysis easier. The goal of this project is twofold: to learn MySQL through application, and to provide insights along the way.

Goals

  • Aggregate Sales Data
  • Identify top-performing products and regions.
  • Investigate opportunities for prescriptive analytics
  • Examine clustering and forecasting options

Citations

  • Box, George E.P., Gwilym M. Jenkins, Gregory C. Reinsel, and Greta M. Ljung. “9 Analysis of Seasonal Time Series.” Essay. In Time Series Analysis: Forecasting and Control, 5th ed., 305–51. Hoboken, New Jersey: John Wiley & Sons, Inc., 2016.
  • Chowdhury, Vivek. “Superstore Dataset.” Kaggle, February 17, 2022.
  • Hyndman, R. J., and G. Athanasopoulos. “8.9 Seasonal ARIMA Models.” Essay. In Forecasting: Principles and Practice, 2nd ed. Melbourne, Australia: OTexts, 2018. https://otexts.com/fpp2/seasonal-arima.html.
  • Keany, Eoghan. “The Ultimate Guide for Clustering Mixed Data.” Medium, November 30, 2021. https://medium.com/analytics-vidhya/the-ultimate-guide-for-clustering-mixed-data-1eefa0b4743b.
  • Martin, Michael. “Sample - Superstore.” Tableau Community, April 29, 2020.
  • Wikipedia contributors. “Augmented Dickey–Fuller Test.” Wikipedia, September 19, 2023.

Reflection

This turned out to be quite the passion project, that tied together a level of organization, research, and critical thinking - particularly around model selection - that I hope to improve upon in future analyses. I believe that I've learned more in this project than in the three previous ones combined, and enjoyed it dearly. In reflection, I believe that Microsoft PowerBI could've been utilized from the start to reduce the amount of code necessary for visualization, and potentially reduced its memory complexity.