sibylhe/mmm_stan

Plug in for Media competitors and AdjR2

Emnlv opened this issue · 4 comments

Emnlv commented

Hi Sibyl,

Sorry I thought to asnwer you about this question.
"Why do you want to avoid 0's, or what's your purpose for changing the adstock? I don't understand your problem and goal, if you let me know, I might have more specific answer" At the end you were right, it didn't change anything. So it's correct to leave the code like this.

I am sharing some chunks of the code that I added/modified, hoping they can be helpful.
Here I add a new part for the First Model to have the R2 and the AdjR2 (I don't know how much can be helpful, but I hope a little bit). Just plug at the end of the Stan Code.

###FIRST MODEL

###STAN CODE

  generated quantities {
  real rss;                
  real totalss;
  real <upper=1> R2 ;
  real <upper=1> AdjR2;                 
  vector[N] mu;
  
  mu <- X1 * beta1 + X2 * beta2 + X3 * beta3 + alpha;
  rss <- dot_self(y-mu);
  totalss <- dot_self(y-mean(y));
  R2 <- 1 - rss/totalss;
  AdjR2 <- 1 - ((1-R2)*(N-1))/(N - (K1+K2+K3) - 1);
}
'''

    sm1 = pystan.StanModel(model_code=ctrl_code1, verbose=True)
    fit1 = sm1.sampling(data=ctrl_data, iter=5000, chains=5, n_jobs=-1, seed=10)
    fit1_result` = fit1.extract()
    fit1_result["R2"].mean()
    fit1_result["AdjR2"].mean()

In the Second Model, I plugged into it a new chunk to have also the media competitors effect (forced as negative). I create a new matrix Z and for it a specific peak and decay. In addition is possible to see also distribution beta for them.
However, I don't know if this is out of the scope of your publication.
Regarding this second step, I have one question. As you can see, I am calculating the y_pred like this:
y_pred = factor_df.drop(mdip_comp_cols, axis =1).apply(np.prod, axis=1)
where "mdip_comp_cols" are the columns of the media competitors. I did this to avoid the negative values since I calculated the beta competitors like this:

for i in range(num_media_comp):
        colname = media_comp_vars[i]
        factor_df[colname] = -(X2[colname] ** (beta2 if type(beta2) == float else beta2[i]))

Do you know how to calculate the y_pred considering also the effect of the media competitors?
In addition I tried to take it out the R2 and the AdjR2 without having results. At the same time I don't know how much can be helpful to have it. Eventually, in this code I didn't consider the mc_delta for the proportion.

Here you can find the code. I started from the starting point of Second Model because there are many steps in between.
It is only necessary to create this in the first lines of the importing phase also because it is necessary to use them to check for multicollinearity among media:

mdip_comp_cols = [col for col in df.columns if 'atl_' in col]

###FIRST MODEL

df_mmm, sc_mmm = mean_log1p_trandform(df, ['total_volume', 'base_sales'])   
mu_mdip = df[mdip_cols].apply(np.mean, axis=0).values
mu_mdip_comp = df[mdip_comp_cols].apply(np.mean, axis=0).values
max_lag = 8
num_media = len(mdip_cols)
num_media_comp = len(mdip_comp_cols)
X_media = np.concatenate((np.zeros((max_lag-1, num_media)), df[mdip_cols].values), axis=0)
X_comp_media = np.concatenate((np.zeros((max_lag-1, num_media_comp)), df[mdip_comp_cols].values), axis=0)
X_ctrl = df_mmm['base_sales'].values.reshape(len(df),1)

###STAN CODE

model_data2 = {
    'N': len(df),
    'max_lag': max_lag, 
    'num_media': num_media,
    'num_media_comp': num_media_comp,
    'X_media': X_media, 
    'X_media_comp': X_comp_media,
    'mu_mdip': mu_mdip,
    'mu_mdip_comp': mu_mdip_comp,
    'num_ctrl': X_ctrl.shape[1],
    'X_ctrl': X_ctrl, 
    'y': df_mmm['total_volume'].values
}

model_code2 = '''
functions {
  // the adstock transformation with a vector of weights
  real Adstock(vector t, row_vector weights) {
    return dot_product(t, weights) / sum(weights);
  }
}
data {
  // the total number of observations
  int<lower=1> N;
  // the vector of sales
  real y[N];
  // the maximum duration of lag effect, in weeks
  int<lower=1> max_lag;
  // the number of media channels
  int<lower=1> num_media;
  // the number of competitors media for TV 
  int<lower=1> num_media_comp;
  // matrix of media variables
  matrix[N+max_lag-1, num_media] X_media;
  // matrix of media variables
  matrix[N+max_lag-1, num_media_comp] X_media_comp;
  // vector of media variables' mean
  real mu_mdip[num_media];
  // vector of media variables' mean competitors
  real mu_mdip_comp[num_media_comp];
  // the number of other control variables
  int<lower=1> num_ctrl;
  // a matrix of control variables
  matrix[N, num_ctrl] X_ctrl;
}
parameters {
  // residual variance
  real<lower=0> noise_var;
  // the intercept
  real tau;
  // the coefficients for media variables and base sales
  vector<lower=0>[num_media+num_ctrl] beta;
  // the coefficients for media competitors variables
  vector<upper=0>[num_media_comp] beta2;
  // the decay and peak parameter for the adstock transformation of
  // each media
  vector<lower=0,upper=1>[num_media] decay;
  vector<lower=0,upper=ceil(max_lag/2)>[num_media] peak;
  vector<lower=0,upper=1>[num_media_comp] decay2;
  vector<lower=0,upper=ceil(max_lag/2)>[num_media_comp] peak2;
}
transformed parameters {
  // the cumulative media effect after adstock
  real cum_effect;
   // the cumulative media effect after adstock of competitors
  real cum_effect_comp;
  // matrix of media variables after adstock
  matrix[N, num_media] X_media_adstocked;
  // matrix of media variables competitors after adstock
  matrix[N, num_media_comp] Z_media_adstocked;
  // matrix of all predictors
  matrix[N, num_media+num_ctrl] X;
  // matrix of all competitors predictors
  matrix[N, num_media_comp] Z;
  // adstock, mean-center, log1p transformation
  row_vector[max_lag] lag_weights;
  // adstock, mean-center, log1p transformation - for competitors
  row_vector[max_lag] lag_weights_comp;
  
  for (nn in 1:N) {
    for (media in 1 : num_media) {
      for (lag in 1 : max_lag) {
        lag_weights[max_lag-lag+1] <- pow(decay[media], (lag - 1 - peak[media]) ^ 2);
        }
      cum_effect <- Adstock(sub_col(X_media, nn, media, max_lag), lag_weights);
      X_media_adstocked[nn, media] <- log1p(cum_effect/mu_mdip[media]);
     } 
  X <- append_col(X_media_adstocked, X_ctrl);
  }
  for (nn in 1:N) {
    for (media_c in 1 : num_media_comp) {
      for (lag in 1 : max_lag) {
        lag_weights_comp[max_lag-lag+1] <- pow(decay2[media_c], (lag - 1 - peak2[media_c]) ^ 2);
        }
      cum_effect_comp <- Adstock(sub_col(X_media_comp, nn, media_c, max_lag), lag_weights_comp);
      Z_media_adstocked[nn, media_c] <- log1p(cum_effect_comp/mu_mdip_comp[media_c]);
     } 
  Z <- Z_media_adstocked;
  }
}
model {
  decay ~ beta(3,3);
  peak ~ uniform(0, ceil(max_lag/2));
  decay2 ~ beta(3,3);
  peak2 ~ uniform(0, ceil(max_lag/2));
  tau ~ normal(0, 5);
  for (i in 1 : num_media+num_ctrl) {
    beta[i] ~ normal(0, 1);
  }
  for (i in 1 : num_media_comp) {
    beta2[i] ~ normal(0, 1);
  }
  noise_var ~ inv_gamma(0.05, 0.05 * 0.01);
  y ~ normal(tau + X * beta + Z * beta2, sqrt(noise_var));
}
'''

###FITTING AND EXTRAPOLATION FEATURES

sm2 = pystan.StanModel(model_code=model_code2, verbose=True)
fit2 = sm2.sampling(data=model_data2, iter=4000, chains=5, n_jobs=-1, seed=10)
fit2_result = fit2.extract()
stan_utility.check_all_diagnostics(fit2)


def extract_mmm(fit_result, max_lag=max_lag, media_vars=mdip_cols, media_comp_vars = mdip_comp_cols, ctrl_vars=['base_sales'], extract_param_list=True):
    mmm = {}
    mmm['max_lag'] = max_lag
    mmm['media_vars'], mmm['media_comp_vars'], mmm['ctrl_vars'] = media_vars, media_comp_vars, ctrl_vars
    mmm['decay'] = decay = fit_result['decay'].mean(axis=0).tolist()
    mmm['decay2'] = decay2 = fit_result['decay2'].mean(axis=0).tolist()
    mmm['peak'] = peak = fit_result['peak'].mean(axis=0).tolist()
    mmm['peak2'] = peak2 = fit_result['peak2'].mean(axis=0).tolist()
    mmm['beta'] = fit_result['beta'].mean(axis=0).tolist()
    mmm['beta2'] = fit_result['beta2'].mean(axis=0).tolist()
    mmm['tau'] = fit_result['tau'].mean()
    if extract_param_list:
        mmm['decay_list'] = fit_result['decay'].tolist()
        mmm['decay2_list'] = fit_result['decay2'].tolist()
        mmm['peak_list'] = fit_result['peak'].tolist()
        mmm['peak2_list'] = fit_result['peak2'].tolist()
        mmm['beta_list'] = fit_result['beta'].tolist()
        mmm['beta2_list'] = fit_result['beta2'].tolist()
        mmm['tau_list'] = fit_result['tau'].tolist()
    
    adstock_params = {}
    media_names = [col.replace('metric_', '') for col in media_vars]
    for i in range(len(media_names)):
        adstock_params[media_names[i]] = {
            'L': max_lag,
            'P': peak[i],
            'D': decay[i]
        }
    mmm['adstock_params'] = adstock_params
    
    adstock_params_comp = {}
    media_comp_names = [col.replace('atl_', '') for col in media_comp_vars]
    for i in range(len(media_comp_names)):
        adstock_params_comp[media_comp_names[i]] = {
            'L': max_lag,
            'P': (peak2 if type(peak2) == float else peak2[i]),
            'D': (decay2 if type(decay2) == float else decay2[i])
        }
    mmm['adstock_params_comp'] = adstock_params_comp
    return mmm

mmm = extract_mmm(fit2, max_lag=max_lag, media_vars=mdip_cols, media_comp_vars=mdip_comp_cols, ctrl_vars=['base_sales'])

###BETA PLOT

beta_media = {}
for i in range(len(mmm['media_vars'])):
    md = mmm['media_vars'][i]
    betas = []
    for j in range(len(mmm['beta_list'])):
        betas.append(mmm['beta_list'][j][i])
    beta_media[md] = np.array(betas)

f = plt.figure(figsize=(18,15))
for i in range(len(mmm['media_vars'])):
    ax = f.add_subplot(5,3,i+1)
    md = mmm['media_vars'][i]
    x = beta_media[md]
    mean_x = x.mean()
    median_x = np.median(x)
    ax = sns.distplot(x)
    ax.axvline(mean_x, color='r', linestyle='-')
    ax.axvline(median_x, color='g', linestyle='-')
    ax.set_title(md)

###BETA COMP

beta_media2 = {}
for i in range(len(mmm['media_comp_vars'])):
    md = mmm['media_comp_vars'][i]
    betas = []
    for j in range(len(mmm['beta2_list'])):
        (betas.append(mmm['beta2_list'][j]) if type(mmm['beta2']) == float else betas.append(mmm['beta2_list'][j][i]))
    beta_media2[md] = np.array(betas)

f = plt.figure(figsize=(18,15))
for i in range(len(mmm['media_comp_vars'])):
    ax = f.add_subplot(5,3,i+1)
    md = mmm['media_comp_vars'][i]
    x = beta_media2[md]
    mean_x = x.mean()
    median_x = np.median(x)
    ax = sns.distplot(x)
    ax.axvline(mean_x, color='r', linestyle='-')
    ax.axvline(median_x, color='g', linestyle='-')
    ax.set_title(md)

###MEDIA CONTRIBUTION

def mmm_decompose_contrib(mmm, df, original_sales=df['total_volume']):
    # adstock params
    adstock_params = mmm['adstock_params']
    # coefficients, intercept
    adstock_params_comp = mmm['adstock_params_comp']
    beta, beta2, tau = mmm['beta'], mmm['beta2'], mmm['tau']
    # variables
    media_vars, media_comp_vars, ctrl_vars = mmm['media_vars'], mmm['media_comp_vars'], mmm['ctrl_vars']
    num_media, num_media_comp, num_ctrl = len(media_vars), len(media_comp_vars), len(ctrl_vars)
    # X_media2: adstocked, mean-centered media variables + 1
    # It apply adstock formula for every channel, transforming values
    X_media2 = adstock_transform(df, media_vars, adstock_params)
    X_media_comp2 = adstock_transform(df, media_comp_vars, adstock_params_comp)
    # it paramtrize to mean  0 all media sc is mean
    X_media2, sc_mmm2 = mean_center_trandform(X_media2, media_vars)
    X_media_comp2, sc_mmm_comp_2 = mean_center_trandform(X_media_comp2, media_comp_vars)
    #it sums to every channel +1: it means that all 0 are +1
    X_media2 = X_media2 + 1
    X_media_comp2 = X_media_comp2 + 1
    # X_ctrl2, mean-centered control variables + 1
    # It parametrize to mean 0 the base_sales
    X_ctrl2, sc_mmm2_1 = mean_center_trandform(df[ctrl_vars], ctrl_vars)
    X_ctrl2 = X_ctrl2 + 1
    # y_true2, mean-centered sales variable + 1
    y_true2, sc_mmm2_2 = mean_center_trandform(df, ['total_volume'])
    y_true2 = y_true2 + 1
    #it adds the line sc_mmm2_1 and sc_mm2_2 to sc_mmm2 -- why?
    sc_mmm2.update(sc_mmm2_1)
    sc_mmm2.update(sc_mmm2_2)
    # X2 <- media variables + ctrl variable
    X2 = pd.concat([X_media2, X_ctrl2, X_media_comp2], axis=1)

    # 1. compute each media/control factor: 
    # log-log model: log(sales) = log(X[0])*beta[0] + ... + log(X[13])*beta[13] + tau
    # multiplicative model: sales = X[0]^beta[0] * ... * X[13]^beta[13] * e^tau
    # each factor = X[i]^beta[i]
    # intercept = e^tau
    # Here is transformation from log-log to base
    # Indeed all the beta for parameter estimated are ^(**). B[9] = beta[num_media+i] = beta for base_sales
    factor_df = pd.DataFrame(columns=media_vars+ctrl_vars+media_comp_vars+['intercept'])
    for i in range(num_media):
        colname = media_vars[i]
        factor_df[colname] = X2[colname] ** beta[i]
    for i in range(num_ctrl):
        colname = ctrl_vars[i]
        factor_df[colname] = X2[colname] ** beta[num_media+i]
    for i in range(num_media_comp):
        colname = media_comp_vars[i]
        factor_df[colname] = -(X2[colname] ** (beta2 if type(beta2) == float else beta2[i]))
    factor_df['intercept'] = np.exp(tau)

    # 2. calculate the product of all factors -> y_pred
    # baseline = intercept * control factor = e^tau * X[13]^beta[13]
    # .apply(np.prod, axis = 1. Multiplication for every row)
    y_pred = factor_df.drop(mdip_comp_cols, axis =1).apply(np.prod, axis=1)
    factor_df['y_pred'], factor_df['y_true2'] = y_pred, y_true2
    factor_df['baseline'] = factor_df[['intercept']+ctrl_vars].apply(np.prod, axis=1)

    # 3. calculate each media factor's contribution
    # media contribution = total volume – volume upon removal of the media factor
    mc_df = pd.DataFrame(columns=media_vars+media_comp_vars+['baseline'])
    for col in (media_vars):
        mc_df[col] = factor_df['y_true2'] - factor_df['y_true2']/factor_df[col]
    for col in (media_comp_vars):
        mc_df[col] = factor_df['y_true2'] + factor_df['y_true2']/factor_df[col]
    mc_df['baseline'] = factor_df['baseline']
    mc_df['y_true2'] = factor_df['y_true2']

    # 4. scale contribution
    # difference between mc_true(y_true2 - baseline) and mc_pred. This difference 
    # used to regauge/scale/proportion w.r.t. each media factors
    # predicted total media contribution: product of all media factors
    mc_df['mc_pred'] = mc_df[media_vars + media_comp_vars].apply(np.sum, axis=1)
    # true total media contribution: total volume - baseline
    mc_df['mc_true'] = mc_df['y_true2'] - mc_df['baseline']
    
    # 5. scale mc_df based on original sales
    mc_df['total_volume'] = original_sales
    for col in media_vars+media_comp_vars+['baseline']:
        mc_df[col] = mc_df[col]*mc_df['total_volume']/mc_df['y_true2']
    
    print('rmse (log-log model): ', 
         (mean_squared_error(y_true2, y_pred, squared = False))/(y_true2.mean())*100)
    print('mape (multiplicative model): ', 
         mean_absolute_percentage_error(y_true2, y_pred))
    return mc_df

###calculate media contribution percentage

def calc_media_contrib_pct(mc_df, media_vars=mdip_cols, media_comp_vars = mdip_comp_cols, sales_col='total_volume', period=52):
    '''
    returns:
    mc_pct: percentage over total sales
    mc_pct2: percentage over incremental sales (sales contributed by media channels)
    '''
    mc_pct = {}
    mc_pct2 = {}
    s = 0
    if period is None:
        for col in (media_vars+media_comp_vars+['baseline']):
            mc_pct[col] = (mc_df[col]/mc_df[sales_col]).mean()
    else:
        for col in (media_vars+media_comp_vars+['baseline']):
            mc_pct[col] = (mc_df[col]/mc_df[sales_col])[-period:].mean()
    for m in media_vars:
        s += mc_pct[m]
    for m in (media_vars+media_comp_vars):
        mc_pct2[m] = mc_pct[m]/s
    return mc_pct, mc_pct2

As always thank you for your help, hoping these modifications can be helpful or at least interesting

I like the idea of adding competitor effect!

Before diving into the code, I have some questions.

  1. Definition of your media competitor variables, are they media impression/spend of your competitors?
  2. Above all, the media competitor variables, are they control variables or media variables?
    Often competing strength/competition score is a control variable in MMM, media variables only include your own channels. So I tend to believe they are control variables.
    If you consider them as media variable:
    In multiplicative model, media effects are multiplicative, media channels act synergically. It's okay to assume your own channels work together, but for your competitors' channels, are they working so closely with your own channels? Do they have equal importance as your own channels on your sales?

How to calculate the y_pred considering also the effect of the media competitors?
No need to do this:

for i in range(num_media_comp):
        colname = media_comp_vars[i]
        factor_df[colname] = -(X2[colname] ** (beta2 if type(beta2) == float else beta2[i]))
y_pred = factor_df.drop(mdip_comp_cols, axis =1).apply(np.prod, axis=1)

Your mdip_comp factors are not negative, 100^(-0.05) is a positive number.
If that was the only obstacle (now resolved), I think the mmm_decompose_contrib() function still works for you.

Emnlv commented

Happy that you like the idea!

I will better specify the variables I used.

In the First Model, I added the competitor promo as negative (X3): this is for helping to calculate the base_sales.

In the Second Model, I don't get why should I to consider them positive?
I am using media metrics of the competitors, and I am doing this for two main reasons:

  1. As my media, also their media (the competitor ones) have to behave to a proprer adstock rules.
  2. Because my assumption is that they have negative contribution on my media effect (they can absorb part of my media effect).
    So, do you suggest to leave the code as you wrote, but at the end, when I have all the data transformed, to put a negative sign to the transformed competitor media? This can help of course to calculate the y_pred and also the MAPE.
    If it is like this, I am getting it ;)

My thoughts...

  1. As my media, also their media (the competitor ones) have to behave to a proprer adstock rules.
    You can apply adstock transformation to them when they are control variables.
  2. Because my assumption is that they have negative contribution on my media effect (they can absorb part of my media effect).
    Being control variables, they also have negative effect on your sales.

I saw agencies/papers always use competition as control variable. I haven't dealt with competitors data (I did MMM in-house), so I feel not competent to give instructions on this. It's unconventional to use them as focal variables, I can't tell it's a good or bad practice, you're free to try. I'm interested in your result.

Computing media contribution
Here's a simplified example.

y = baseline * x1^b1 * x2^b2
y: sales
x1: your channel, b1 > 0
x2: competitor's channel, b2 < 0

baseline=1000
x1=500, b1=0.03, x1^b1=1.2
x2=100, b2=-0.01, x2^b2=0.95
y_pred = 1000 * 1.2 * 0.95 = 1140
contribution of your channel: c1 = 1140 - 1140/1.2 = 190
contribution of competitor's channel: c2 = 1140 - 1140/0.95 = -60
Suppose real sales y_true = 1100
Predicted total media contribution: mc_pred = c1 + c2 = 190 + (-60) = 130
True total media contribution: mc_true = 1100 - 1000 = 100
Scale:
c1 = 190 * (100/130) = 146.15
c2 = -60 * (100/130) = -46.15

About code:

for i in range(num_media_comp):
        colname = media_comp_vars[i]
        factor_df[colname] = -(X2[colname] ** (beta2 if type(beta2) == float else beta2[i])) # no need to flip the sign -> factor_df[colname] = (X2[colname] ** (beta2 if type(beta2) == float else beta2[i]))
y_pred = factor_df.drop(mdip_comp_cols, axis =1).apply(np.prod, axis=1) # no need to drop mdip_comp_cols

(Sorry I don't have time to follow chunks of code, I only looked at these problematic lines you highlighted.)

Emnlv commented

Hi Sibyl,
Thank you for your honesty.
You are definitely right, I made the modification you said and it works perfectly.
Now I have the correct MAPE, having the variables behave how I want.

I did this little modification, inside the mmm_decompose_contrib.

# 3. calculate each media factor's contribution
    # media contribution = total volume – volume upon removal of the media factor
    mc_df = pd.DataFrame(columns=media_vars+media_comp_vars+['baseline'])
    for col in (media_vars):
        mc_df[col] = factor_df['y_true2'] - factor_df['y_true2']/factor_df[col]
    for col in (media_comp_vars):
        mc_df[col] = factor_df['y_true2'] + factor_df['y_true2']/-factor_df[col]
    mc_df['baseline'] = factor_df['baseline']
    mc_df['y_true2'] = factor_df['y_true2']

Really thank you for your suggestions and help!