This repository contains a sales forecasting project that aims to predict the number of products sold across multiple stores. The model uses historical sales data to forecast future sales. This project was developed in Python, using machine learning libraries like Pandas, Scikit-learn, Matplotlib, and Seaborn.
The dataset used is sourced from Kaggle and spans 10 years (2010-2019), featuring various stores and products.
- Date: The date of the sales data.
- Store ID: Identifier for the store.
- Product ID: Identifier for the product.
- Number Sold: The number of units sold.
- Exploratory Data Analysis (EDA) to understand sales trends and seasonality.
- Data Preprocessing, including feature engineering to add lagged variables.
- Modeling using Random Forest Regressor for its robustness in capturing complex patterns.
- Model Evaluation using Mean Absolute Percentage Error (MAPE), achieving a 1.23% error rate on the test set.
- Feature Importance Analysis to understand the impactful features in the model.