Retail Price Optimization

Contributors: Greer Inns, Sarah Kim, Nolan Tonthat, Sam Spillane, Nazeli Mnatsakanyan, Jing Sy

image

Problem/Objective

Setting the right price for products is essential to maximize profits and meet business goals. However, determining the optimal price for each product can be a very complex and time-consuming task. The challenge that many businesses face is being able to efficiently leverage the data to make informed pricing decisions that find the balance between profitability and market demand.

Solution

We plan to address this problem by utilizing a linear regression model to analyze products and categories, competitive pricing, and customer ratings from the Amazon Product Sales Dataset to accurately recommend optimal prices for various products. This will enable retailers to make strategic business decisions.

Process

1. ETL/EDA

  • Utilize Pandas:
    • Load data, basic exploration (data structure), data cleaning (e.g., df.isnull()), data transformation (e.g., df.drop, pd.to_datetime, df.sort)
  • Utilize Matplotlib, Seaborn:
    • Visualize and identify overall trends and patterns before building models
      • Correlation Heatmap
      • Total Price by Quantity Sold
      • Total Price by Quantity Sold in Garden Tools
      • Sales Volume, Revenue, and Customers Across the Product Categories

Image 1

Image 2

Image 3

  • Utilize sqlite3:
    • Load cleaned data into SQL database titled 'pricing_opt.db'
    • Exported pandas df as the table 'retail'

2. Build, Train, Test, Evaluate ML Models

After reading in the cleaned data from our SQLite database, we tested both cumulative and moving window train/test selection methods to optimize our initial regression model:

Cumulative Training:

  • For cumulative training we started with an initial training dataset that includes data from the beginning of our time series up to a defined point.
  • We then trained our linear regression model on this dataset
  • After training, we make predictions for a future time period
  • Then we took the actual observed data for that period and added it to our training dataset for our model
  • We then made predictions for the next time step and repeated the above process

Cumulative Selection

Sliding Window Training:

  • For sliding window we started with a fixed-size training window that would slide forward through our time series data
  • We then trained our linear regression model on this dataset
  • After training, we make predictions for a future time period
  • Then we took the actual observed data for that period and made that our training dataset for our model
  • We then made predictions for the next time step and repeated the above process
  • We found better performance for optimization with sliding window, hypothetically this could be due to the fact that for price optimization, historical data eventually loses some relevance.
  • Following the selection of this method, we created a window size optimization visual, to select our window size of 5

Screenshot 2023-08-21 at 1 03 00 PM

Screenshot 2023-08-21 at 12 47 43 PM

We then utilized three supervised machine learning models to optimize our model for ideal price points, all utilizing the sliding window method for data selection:

Linear Regression

  • Step 1: Define Window Size (5)
  • Step 2: Group by product and set features/target variable
  • Step 3: Sliding Window training and testing to predict data points
  • Step 4: Linear Regression Model Training using training data (features_train, target_train)
  • Step 5: Unit price predictions (features_test)
  • Step 6: Evaluate using Mean Squared Error (MSE) - actual vs. predicted
  • Step 7: Calculate avg predicted price and MSE and loop to find price with min MSE (Optimal Price) Linear Regression

Random Forest

  • Step 1: Define Window Size (5)
  • Step 2: Group by product and set features/target variable
  • Step 3: Sliding Window training and testing to predict data points
  • Step 4: Random Forest Regressor Model Training using training data (features_train, target_train)
  • Step 5: Unit price predictions (features_test)
  • Step 6: Evaluate using Mean Squared Error (MSE) - actual vs. predicted
  • Step 7: Calculate avg predicted price and MSE and loop to find price with min MSE (Optimal Model for Predicting Price)

Random Forest

Random Forest Optimization, Standard Scaling and Feature Importance
  • In our random forest model, we also needed to determine the optimal number of decision trees, which was determined through the visualization below
  • As the visualization was inconclusive that decision trees had a significant effect on the results, the trees were set as 50.

Screenshot 2023-08-21 at 1 33 11 PM

  • Additionally in this model, we looked at feature importance, across the 415 iterations of the model across the given window and time series, the iterations shown by the colors in the stacked bar graph below.
  • Here we see categories such as volume and product rating have little to no impact in our model.
  • We see freight price and competitor freight price to have the highest impact

Screenshot 2023-08-21 at 1 33 53 PM

- When standard scaling was applied, we saw an increase in avg MSE and an identical feature similar profile, likely due to a very small variance in the smaller non-similar in units categories.

XGBoost

  • Step 1: Define Window Size (5)
  • Step 2: Group by product and set features/target variable
  • Step 3: Sliding Window training and testing to predict data points
  • Step 4: Create an XGBoost regression model with 50 estimators and a fixed random state
  • Step 5: Unit price predictions (features_test)
  • Step 6: Evaluate using Mean Squared Error (MSE) - actual vs. predicted
  • Step 7: Calculate avg predicted price and MSE and loop to find price with min MSE (Optimal Model for Predicting Price)

XGBoost

4. Export all data and visualize Model Performance/Evaluation

  • Visualize overall EDA process
  • Visualize and compare Model accuracy across all 3 methods
  • Visualize Model performance across iterations/samples

image_360

Model Performance

image_720-1

See Public Dashboards 1 and 2: https://public.tableau.com/app/profile/jing.sy/viz/Price_Optimization_Final/Dashboard1?publish=yes

Average accuracies for the prediction phase of each model are outlined below, however as price optimization was the goal of the project, our visualizations focus on MSE and selection of the price with the minimum MSE to optimize the price, in which case accuracy measurements lose weight as we no longer have an optimal price to compare to:

  • Linear Regression Window: 56.6566%
  • RF: 56.6566%
  • XGBoost: 60.0414%
  • Linear Regression Cumulative: 56.3470%