Data4DM/BayesSD

Inventory management casestudy

Closed this issue · 17 comments

The end goal of this model are three:

  • online decision
  • dynamic timestep aggregation
  • dynamic subscript aggregation

The first casestudy is the fusion of Bayequentist from DailyDigest and ManageChain from MonthlyModel. Three Bayesian checks are applied to chained system where demand/supply and material/information are matched. This is the diagram:

image

I aim to address first two topics from #5,

Originally posted by hyunjimoon September 12, 2022

Goal

to figure out the best way to tune the following hyper-parameters and its useful resource. I left the link for each discussion (#12, #7, #9, #11) as they need active development, so please leave comment in each link! If you think of any other tunable parmameters, please leave a comment so that I can allocate a new discussion thread for that.

1. Typify model into PA or PAD based on the research purpose in #12

2. Typify parameters to assumed or assumed time-series or estimated for testing in #7

  • this discussion on holding constant (clamping) between two leaders (Andrew Gelman and Judea Pearl)

@tomfid the following are analysis code and vensim file readied for the above goal. Ready to go!

It is recommended to make each commit unified so that communication could be like: "see commit 684dfb1 for the update" but will do better next time.

First fit success with this model!
image
Only the second moment is incorrect in the first run (row), while both the first and second moment is biased (second row).
The plot below compares the prior draws (true) and estimated values:
image

true dist:
image

Digging into dimension of prior_pred, post_draws in #23 (comment)

I need help in using driving data using vensim's lookup function structure.

In other words, using lookups as a container for data? I wouldn't normally do that, unless I wanted random access to the data, or was planning to export to some other tool that couldn't digest Vensim data variables.

One possible strategy is to create placeholder lookups with generic values, then populate them at runtime with a .cin file (perhaps generated in Excel) containing the real values. This won't work if the goal is to export the .mdl though. In that case, with a little finagling, you could just generate the lookups in mdl syntax and paste them into the equation listing.

There are two forms in mdl syntax:

https://www.vensim.com/documentation/lookups.html

LOOKUP NAME([(Xmin, Ymin)-(Xmax, Ymax),(Xref1, Yref1),(Xref2,Yref2),...(Xrefn,Yrefn)]

(X1, Y1), (X2,Y2), . . .(Xn, Yn) )
~ Units ~ Description |

The stuff in [ ] can be omitted.

or:

LOOKUP NAME( X1, X2, X3,....Xn, Y1, Y2, Y3,....Yn)
~ Units ~ Description |

.cin syntax is the same - just make sure it's all on one line, and omit the ~ and stuff after.

This attempt to use lookup function for driving data input structure started from digging into how pysd is using external data via initializing the model with .initialize(data_file = .tab). SDXorg/pysd#371

To introduce vensim2stan translator, the following must be completed by 10/21.

Must to do

a. prove vensim and stan draws2data generate equal results (with blue variables as vdfx or csv)
b. data2draws2 debug
c. plot using xarray or arviz inferencedata (how many draws, default 1000)
d. analyze fit, increase the number of estimated variable from 2 to 6

Can do

e. data2draws2 loglik computation
f. loo computation and PIT
g. extend to hierarchical model
h. how to incorporate hierarchical modeling with testing framework to make the test result less path dependent? Should we start from no pooling or complete pooling? #34
e. implement hierarchical Bayes in vensim + Venpy

Done

> prior_pred_obs['production_start_rate_stocked_obs'].mean('draw').values.flatten()
Out[26]: 
array([-4.40146950e-03,  3.36716915e-03,  7.66082443e-03,  1.60187649e-02,
       -3.77674004e-03,  4.27628952e-03,  1.13146864e-03, -4.19492450e-03,
       -6.59530433e-03, -4.86387919e-03,  5.67682390e-03,  5.63749278e+09,
        5.10863331e+09,  2.47952251e+09,  4.95773346e+06,  5.53670186e-01,
       -1.44968518e-03,  1.15607441e-02,  5.49195196e-03,  7.08743052e-03])

Pass generated (via draws2data) vectors (averaged across 1,000 different prior draws) to data2draws.

ran draws2data for inventory management after transforming two flow rates (production start rate, production rate) into stocks by adding process noise then generated
image
image

  • tracing spikes and negative values
prior_pred_obs
Out[17]: 
<xarray.Dataset>
Dimensions:                            (chain: 1, draw: 1000,
                                        production_start_rate_stocked_obs_dim_0: 20,
                                        production_rate_stocked_obs_dim_0: 20)
Coordinates:
  * chain                              (chain) int64 1
  * draw                               (draw) int64 0 1 2 3 ... 996 997 998 999
Dimensions without coordinates: production_start_rate_stocked_obs_dim_0,
                                production_rate_stocked_obs_dim_0
Data variables:
    production_start_rate_stocked_obs  (chain, draw, production_start_rate_stocked_obs_dim_0) float64 ...
    production_rate_stocked_obs        (chain, draw, production_rate_stocked_obs_dim_0) float64 ...

image

First estimation success with realistic values:

  1. put real values in vensim (no int)
  2. unify mean of est_param in vensim and in stanify set_prior command
Dimensions:                              (draw: 100, chain: 2,
                                          initial_outcome_dim_0: 7,
                                          integrated_result_dim_0: 20,
                                          integrated_result_dim_1: 7,
                                          work_in_process_inventory_dim_0: 20,
                                          expected_order_rate_dim_0: 20,
                                          process_noise_dim_0: 20,
                                          production_start_rate_stocked_dim_0: 20,
                                          production_rate_stocked_dim_0: 20,
                                          backlog_dim_0: 20, inventory_dim_0: 20)
Coordinates:
  * chain                                (chain) int64 1 2
  * draw                                 (draw) int64 0 1 2 3 4 ... 96 97 98 99
Dimensions without coordinates: initial_outcome_dim_0, integrated_result_dim_0,
                                integrated_result_dim_1,
                                work_in_process_inventory_dim_0,
                                expected_order_rate_dim_0, process_noise_dim_0,
                                production_start_rate_stocked_dim_0,
                                production_rate_stocked_dim_0, backlog_dim_0,
                                inventory_dim_0
Data variables: (12/19)
    inventory_adjustment_time            (chain, draw) float64 6.306 ... 1.146
    minimum_order_processing_time        (chain, draw) float64 0.4911 ... 0.4497
    m_noise_scale                        (chain, draw) float64 0.03948 ... 0....
    work_in_process_inventory__init      (chain, draw) float64 422.2 ... 422.2
    expected_order_rate__init            (chain, draw) float64 10.01 ... 10.01
    process_noise__init                  (chain, draw) float64 0.1209 ... 0.1209
    ...                                   ...
    expected_order_rate                  (chain, draw, expected_order_rate_dim_0) float64 ...
    process_noise                        (chain, draw, process_noise_dim_0) float64 ...
    production_start_rate_stocked        (chain, draw, production_start_rate_stocked_dim_0) float64 ...
    production_rate_stocked              (chain, draw, production_rate_stocked_dim_0) float64 ...
    backlog                              (chain, draw, backlog_dim_0) float64 ...
    inventory                            (chain, draw, inventory_dim_0) float64 ...

image

Discussion with Hazhir
0. was S (the number of prior draws)
image

image

Recommend `loglik` as reliable diagnostics

image

  1. [active initial in vensim?] (https://www.vensim.com/documentation/fn_active_initial.html)
    Cyclic initialization; meaning of the following?

This will cause Capacity to be initialized at 100; then the first value of target capacity will be Capacity * adjust from utilization. In general, this will not be 100; the initial expression is used only to compute the initial conditions of State variables.

  1. SM posterior samples representing parameter and data joint (theta, y)

a. S = 2, M = 100, N = 20 weeks
200 theta posterior samples from joint space (theta, y)

b. S = 4, M = 50, N = 20 weeks
200 theta posterior samples from joint space (theta, y)

prior and likelihood distribution for draws2data, data2draws

  • inv_gamma(2,1) leads to "normal scale nan"
setting_assumption = {
    "est_param_scalar" : ("inventory_adjustment_time", "minimum_order_processing_time"),
    "ass_param_scalar" : ("inventory_coverage", "manufacturing_cycle_time", "safety_stock_coverage", "time_to_average_order_rate", "wip_adjustment_time"),
    "target_simulated_vector" : ("production_start_rate_stocked", "production_rate_stocked"),
    "driving_data_vector" : ("customer_order_rate", "process_noise_std_norm_data", "production_start_rate_m_noise_trun_norm_data", "production_rate_m_noise_trun_norm_data"),
    "model_name": "mngIven",
    "integration_times": list(range(1, n_t + 1)),
    "initial_time": 0.0
}

## numeric data (using values in vensim; cannot specify)
numeric_assumption = {
    "n_t": n_t,
    "customer_order_rate": pd.read_csv('data/example_retail_sales.csv').iloc[:n_t, 1].values,
    "process_noise_std_norm_data": np.random.normal(0,1, size=n_t),
    "production_start_rate_m_noise_trun_norm_data": truncnorm.rvs(0, 2, size=n_t),
    "production_rate_m_noise_trun_norm_data": truncnorm.rvs(0, 2, size=n_t),
    "production_start_rate_stocked_obs": truncnorm.rvs(0, 2, size=n_t),
    "production_rate_stocked_obs": truncnorm.rvs(0, 2, size=n_t),
}

model = StanVensimModel(structural_assumption, setting_dict = setting_assumption, numeric_assump_dict = numeric_assumption)

model.set_prior("inventory_adjustment_time", "normal", 2, 0.4, lower=0)
model.set_prior("minimum_order_processing_time", "normal", 0.05, 0.01, lower=0)
model.set_prior("m_noise_scale", "normal", 0.1, 0.02, lower = 0)
  1. PD model enforce symmetry of draws2data and data2draws while PAD allows asymmetry; data2draws is blackboxed map of $P_\phi(.|Y)$. #12 (comment)

  2. plotting prior distribution vs aggregated posterior samples
    image
    image

  3. Have you seen any seasonality data from epidemics #32 (comment)

B.

  1. two different ways of random variable input:
    theta ~ N(2,1)

G1: Y|theta = 1
G2: Y|theta = 2

Y = .5(P(Y1 |theta =1) + P(Y1 |theta =2))

Estimate theta from Y

  1. approximate: symmetric

Discussion on the value of one random variable (prior) can change in one vensim run or fixed (if later, we run S numbers of vensim with different value of parameter then aggregate outcomes)

Because of the implementational restriction of stochasticity, we are enforcing time-varying concepts in vensim?

  1. $\theta \sim N(3, .4)$
  2. random variable
  • driving data: 2, 3, 2, 3,...
    comparing the probability of our implemented / numerically computed result.

Y_0
Y_1, Y_2
Predictive distirbution, or the p of realized data from the assumed model is greater when we assume homogeneity across time and heterogeneity across space.
P(Y_0|M) < P(.5(Y_1 + Y_2)|M)
image

Filtering is before ergodic or stationary or where time hetereogenity is not settled yet
\theta_t-1 =0.5
\theta_{t} =? generates samples from prior (sampled from a prior distribution = N(\theta_t-1 , 1)

p(D|\theta_t1) + p(D|\theta_t2) + p(D|\theta_t3) + p(D|\theta_t4)

@tomfid I am following your comment on using reference throughput. Thanks. However, I faced the following active initial error and I am not sure about its cause. Especially I don't follow how Process noise std norm data can be initialized in two different ways (leading to unmatched initial and active).
image

image

- shadow structure success

image

  • reference initial stock