/demand-forecasting

Forecasting in R using Prophet. Perform fine-grained forecasting in an efficient manner, leveraging the distributed computational power of the Databricks Platform.

Primary LanguageRGNU General Public License v3.0GPL-3.0

Demand Forecasting

Playground of forecasting in R using Prophet.

Perform fine-grained forecasting in an efficient manner, leveraging the distributed computational power of the Databricks Platform.

Prophet is a forecasting tool that uses an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. It works best with time series that have strong seasonal effects and several seasons of historical data. The library is especially suited for business forecasts like forecasting website traffic, sales, and inventory.

The original paper from the FB team is in the Forecasting_at_Scale.pdf file.

Dataset

For the tests the Store Item Demand Forecasting from Kaggle is used. The data are in the dataset directory.

Keggle Dataset: https://www.kaggle.com/datasets/aswinkum/demand-forecasting-kernels-only

The dataset contains 5-years of store-item unit sales data for 50 items across 10 different stores (913,000 observations).

Forecasting in R-Studio

The source file in R for the forecasting is in Forecasting.R.

Read the comments for step-by-step execution of the forecast and plotting.

Forecasting in DataBricks

The DataBricks notebook for forecasting at scale is in the file OE_Demand_Forecasting.dbc.

500 forecasts are generated in the notebook.

read the training file into a Spark DataFrame

Upload the ./dataset/train.csv file to the catalog as a new table using the Data Ingestion of the DataBricks main interface.

Change the notebook code below accordingly:

train_data = spark_read_csv(sc, name = "train_data",  path = "/Volumes/industry_solutions/esg_scoring/data/train.csv")

AutoML auto-generated notebook

The auto-generated DataBricks notebook with the best AutoML model is in the file AutoML-24-04-30-16_08-Prophet.dbc

With this file is possible to re-run the experiment on forecasting 500 time series and see the results.