Plug in for Media competitors and AdjR2
Emnlv opened this issue · 4 comments
Hi Sibyl,
Sorry I thought to asnwer you about this question.
"Why do you want to avoid 0's, or what's your purpose for changing the adstock? I don't understand your problem and goal, if you let me know, I might have more specific answer" At the end you were right, it didn't change anything. So it's correct to leave the code like this.
I am sharing some chunks of the code that I added/modified, hoping they can be helpful.
Here I add a new part for the First Model to have the R2 and the AdjR2 (I don't know how much can be helpful, but I hope a little bit). Just plug at the end of the Stan Code.
###FIRST MODEL
###STAN CODE
generated quantities {
real rss;
real totalss;
real <upper=1> R2 ;
real <upper=1> AdjR2;
vector[N] mu;
mu <- X1 * beta1 + X2 * beta2 + X3 * beta3 + alpha;
rss <- dot_self(y-mu);
totalss <- dot_self(y-mean(y));
R2 <- 1 - rss/totalss;
AdjR2 <- 1 - ((1-R2)*(N-1))/(N - (K1+K2+K3) - 1);
}
'''
sm1 = pystan.StanModel(model_code=ctrl_code1, verbose=True)
fit1 = sm1.sampling(data=ctrl_data, iter=5000, chains=5, n_jobs=-1, seed=10)
fit1_result` = fit1.extract()
fit1_result["R2"].mean()
fit1_result["AdjR2"].mean()
In the Second Model, I plugged into it a new chunk to have also the media competitors effect (forced as negative). I create a new matrix Z and for it a specific peak and decay. In addition is possible to see also distribution beta for them.
However, I don't know if this is out of the scope of your publication.
Regarding this second step, I have one question. As you can see, I am calculating the y_pred like this:
y_pred = factor_df.drop(mdip_comp_cols, axis =1).apply(np.prod, axis=1)
where "mdip_comp_cols" are the columns of the media competitors. I did this to avoid the negative values since I calculated the beta competitors like this:
for i in range(num_media_comp):
colname = media_comp_vars[i]
factor_df[colname] = -(X2[colname] ** (beta2 if type(beta2) == float else beta2[i]))
Do you know how to calculate the y_pred considering also the effect of the media competitors?
In addition I tried to take it out the R2 and the AdjR2 without having results. At the same time I don't know how much can be helpful to have it. Eventually, in this code I didn't consider the mc_delta for the proportion.
Here you can find the code. I started from the starting point of Second Model because there are many steps in between.
It is only necessary to create this in the first lines of the importing phase also because it is necessary to use them to check for multicollinearity among media:
mdip_comp_cols = [col for col in df.columns if 'atl_' in col]
###FIRST MODEL
df_mmm, sc_mmm = mean_log1p_trandform(df, ['total_volume', 'base_sales'])
mu_mdip = df[mdip_cols].apply(np.mean, axis=0).values
mu_mdip_comp = df[mdip_comp_cols].apply(np.mean, axis=0).values
max_lag = 8
num_media = len(mdip_cols)
num_media_comp = len(mdip_comp_cols)
X_media = np.concatenate((np.zeros((max_lag-1, num_media)), df[mdip_cols].values), axis=0)
X_comp_media = np.concatenate((np.zeros((max_lag-1, num_media_comp)), df[mdip_comp_cols].values), axis=0)
X_ctrl = df_mmm['base_sales'].values.reshape(len(df),1)
###STAN CODE
model_data2 = {
'N': len(df),
'max_lag': max_lag,
'num_media': num_media,
'num_media_comp': num_media_comp,
'X_media': X_media,
'X_media_comp': X_comp_media,
'mu_mdip': mu_mdip,
'mu_mdip_comp': mu_mdip_comp,
'num_ctrl': X_ctrl.shape[1],
'X_ctrl': X_ctrl,
'y': df_mmm['total_volume'].values
}
model_code2 = '''
functions {
// the adstock transformation with a vector of weights
real Adstock(vector t, row_vector weights) {
return dot_product(t, weights) / sum(weights);
}
}
data {
// the total number of observations
int<lower=1> N;
// the vector of sales
real y[N];
// the maximum duration of lag effect, in weeks
int<lower=1> max_lag;
// the number of media channels
int<lower=1> num_media;
// the number of competitors media for TV
int<lower=1> num_media_comp;
// matrix of media variables
matrix[N+max_lag-1, num_media] X_media;
// matrix of media variables
matrix[N+max_lag-1, num_media_comp] X_media_comp;
// vector of media variables' mean
real mu_mdip[num_media];
// vector of media variables' mean competitors
real mu_mdip_comp[num_media_comp];
// the number of other control variables
int<lower=1> num_ctrl;
// a matrix of control variables
matrix[N, num_ctrl] X_ctrl;
}
parameters {
// residual variance
real<lower=0> noise_var;
// the intercept
real tau;
// the coefficients for media variables and base sales
vector<lower=0>[num_media+num_ctrl] beta;
// the coefficients for media competitors variables
vector<upper=0>[num_media_comp] beta2;
// the decay and peak parameter for the adstock transformation of
// each media
vector<lower=0,upper=1>[num_media] decay;
vector<lower=0,upper=ceil(max_lag/2)>[num_media] peak;
vector<lower=0,upper=1>[num_media_comp] decay2;
vector<lower=0,upper=ceil(max_lag/2)>[num_media_comp] peak2;
}
transformed parameters {
// the cumulative media effect after adstock
real cum_effect;
// the cumulative media effect after adstock of competitors
real cum_effect_comp;
// matrix of media variables after adstock
matrix[N, num_media] X_media_adstocked;
// matrix of media variables competitors after adstock
matrix[N, num_media_comp] Z_media_adstocked;
// matrix of all predictors
matrix[N, num_media+num_ctrl] X;
// matrix of all competitors predictors
matrix[N, num_media_comp] Z;
// adstock, mean-center, log1p transformation
row_vector[max_lag] lag_weights;
// adstock, mean-center, log1p transformation - for competitors
row_vector[max_lag] lag_weights_comp;
for (nn in 1:N) {
for (media in 1 : num_media) {
for (lag in 1 : max_lag) {
lag_weights[max_lag-lag+1] <- pow(decay[media], (lag - 1 - peak[media]) ^ 2);
}
cum_effect <- Adstock(sub_col(X_media, nn, media, max_lag), lag_weights);
X_media_adstocked[nn, media] <- log1p(cum_effect/mu_mdip[media]);
}
X <- append_col(X_media_adstocked, X_ctrl);
}
for (nn in 1:N) {
for (media_c in 1 : num_media_comp) {
for (lag in 1 : max_lag) {
lag_weights_comp[max_lag-lag+1] <- pow(decay2[media_c], (lag - 1 - peak2[media_c]) ^ 2);
}
cum_effect_comp <- Adstock(sub_col(X_media_comp, nn, media_c, max_lag), lag_weights_comp);
Z_media_adstocked[nn, media_c] <- log1p(cum_effect_comp/mu_mdip_comp[media_c]);
}
Z <- Z_media_adstocked;
}
}
model {
decay ~ beta(3,3);
peak ~ uniform(0, ceil(max_lag/2));
decay2 ~ beta(3,3);
peak2 ~ uniform(0, ceil(max_lag/2));
tau ~ normal(0, 5);
for (i in 1 : num_media+num_ctrl) {
beta[i] ~ normal(0, 1);
}
for (i in 1 : num_media_comp) {
beta2[i] ~ normal(0, 1);
}
noise_var ~ inv_gamma(0.05, 0.05 * 0.01);
y ~ normal(tau + X * beta + Z * beta2, sqrt(noise_var));
}
'''
###FITTING AND EXTRAPOLATION FEATURES
sm2 = pystan.StanModel(model_code=model_code2, verbose=True)
fit2 = sm2.sampling(data=model_data2, iter=4000, chains=5, n_jobs=-1, seed=10)
fit2_result = fit2.extract()
stan_utility.check_all_diagnostics(fit2)
def extract_mmm(fit_result, max_lag=max_lag, media_vars=mdip_cols, media_comp_vars = mdip_comp_cols, ctrl_vars=['base_sales'], extract_param_list=True):
mmm = {}
mmm['max_lag'] = max_lag
mmm['media_vars'], mmm['media_comp_vars'], mmm['ctrl_vars'] = media_vars, media_comp_vars, ctrl_vars
mmm['decay'] = decay = fit_result['decay'].mean(axis=0).tolist()
mmm['decay2'] = decay2 = fit_result['decay2'].mean(axis=0).tolist()
mmm['peak'] = peak = fit_result['peak'].mean(axis=0).tolist()
mmm['peak2'] = peak2 = fit_result['peak2'].mean(axis=0).tolist()
mmm['beta'] = fit_result['beta'].mean(axis=0).tolist()
mmm['beta2'] = fit_result['beta2'].mean(axis=0).tolist()
mmm['tau'] = fit_result['tau'].mean()
if extract_param_list:
mmm['decay_list'] = fit_result['decay'].tolist()
mmm['decay2_list'] = fit_result['decay2'].tolist()
mmm['peak_list'] = fit_result['peak'].tolist()
mmm['peak2_list'] = fit_result['peak2'].tolist()
mmm['beta_list'] = fit_result['beta'].tolist()
mmm['beta2_list'] = fit_result['beta2'].tolist()
mmm['tau_list'] = fit_result['tau'].tolist()
adstock_params = {}
media_names = [col.replace('metric_', '') for col in media_vars]
for i in range(len(media_names)):
adstock_params[media_names[i]] = {
'L': max_lag,
'P': peak[i],
'D': decay[i]
}
mmm['adstock_params'] = adstock_params
adstock_params_comp = {}
media_comp_names = [col.replace('atl_', '') for col in media_comp_vars]
for i in range(len(media_comp_names)):
adstock_params_comp[media_comp_names[i]] = {
'L': max_lag,
'P': (peak2 if type(peak2) == float else peak2[i]),
'D': (decay2 if type(decay2) == float else decay2[i])
}
mmm['adstock_params_comp'] = adstock_params_comp
return mmm
mmm = extract_mmm(fit2, max_lag=max_lag, media_vars=mdip_cols, media_comp_vars=mdip_comp_cols, ctrl_vars=['base_sales'])
###BETA PLOT
beta_media = {}
for i in range(len(mmm['media_vars'])):
md = mmm['media_vars'][i]
betas = []
for j in range(len(mmm['beta_list'])):
betas.append(mmm['beta_list'][j][i])
beta_media[md] = np.array(betas)
f = plt.figure(figsize=(18,15))
for i in range(len(mmm['media_vars'])):
ax = f.add_subplot(5,3,i+1)
md = mmm['media_vars'][i]
x = beta_media[md]
mean_x = x.mean()
median_x = np.median(x)
ax = sns.distplot(x)
ax.axvline(mean_x, color='r', linestyle='-')
ax.axvline(median_x, color='g', linestyle='-')
ax.set_title(md)
###BETA COMP
beta_media2 = {}
for i in range(len(mmm['media_comp_vars'])):
md = mmm['media_comp_vars'][i]
betas = []
for j in range(len(mmm['beta2_list'])):
(betas.append(mmm['beta2_list'][j]) if type(mmm['beta2']) == float else betas.append(mmm['beta2_list'][j][i]))
beta_media2[md] = np.array(betas)
f = plt.figure(figsize=(18,15))
for i in range(len(mmm['media_comp_vars'])):
ax = f.add_subplot(5,3,i+1)
md = mmm['media_comp_vars'][i]
x = beta_media2[md]
mean_x = x.mean()
median_x = np.median(x)
ax = sns.distplot(x)
ax.axvline(mean_x, color='r', linestyle='-')
ax.axvline(median_x, color='g', linestyle='-')
ax.set_title(md)
###MEDIA CONTRIBUTION
def mmm_decompose_contrib(mmm, df, original_sales=df['total_volume']):
# adstock params
adstock_params = mmm['adstock_params']
# coefficients, intercept
adstock_params_comp = mmm['adstock_params_comp']
beta, beta2, tau = mmm['beta'], mmm['beta2'], mmm['tau']
# variables
media_vars, media_comp_vars, ctrl_vars = mmm['media_vars'], mmm['media_comp_vars'], mmm['ctrl_vars']
num_media, num_media_comp, num_ctrl = len(media_vars), len(media_comp_vars), len(ctrl_vars)
# X_media2: adstocked, mean-centered media variables + 1
# It apply adstock formula for every channel, transforming values
X_media2 = adstock_transform(df, media_vars, adstock_params)
X_media_comp2 = adstock_transform(df, media_comp_vars, adstock_params_comp)
# it paramtrize to mean 0 all media sc is mean
X_media2, sc_mmm2 = mean_center_trandform(X_media2, media_vars)
X_media_comp2, sc_mmm_comp_2 = mean_center_trandform(X_media_comp2, media_comp_vars)
#it sums to every channel +1: it means that all 0 are +1
X_media2 = X_media2 + 1
X_media_comp2 = X_media_comp2 + 1
# X_ctrl2, mean-centered control variables + 1
# It parametrize to mean 0 the base_sales
X_ctrl2, sc_mmm2_1 = mean_center_trandform(df[ctrl_vars], ctrl_vars)
X_ctrl2 = X_ctrl2 + 1
# y_true2, mean-centered sales variable + 1
y_true2, sc_mmm2_2 = mean_center_trandform(df, ['total_volume'])
y_true2 = y_true2 + 1
#it adds the line sc_mmm2_1 and sc_mm2_2 to sc_mmm2 -- why?
sc_mmm2.update(sc_mmm2_1)
sc_mmm2.update(sc_mmm2_2)
# X2 <- media variables + ctrl variable
X2 = pd.concat([X_media2, X_ctrl2, X_media_comp2], axis=1)
# 1. compute each media/control factor:
# log-log model: log(sales) = log(X[0])*beta[0] + ... + log(X[13])*beta[13] + tau
# multiplicative model: sales = X[0]^beta[0] * ... * X[13]^beta[13] * e^tau
# each factor = X[i]^beta[i]
# intercept = e^tau
# Here is transformation from log-log to base
# Indeed all the beta for parameter estimated are ^(**). B[9] = beta[num_media+i] = beta for base_sales
factor_df = pd.DataFrame(columns=media_vars+ctrl_vars+media_comp_vars+['intercept'])
for i in range(num_media):
colname = media_vars[i]
factor_df[colname] = X2[colname] ** beta[i]
for i in range(num_ctrl):
colname = ctrl_vars[i]
factor_df[colname] = X2[colname] ** beta[num_media+i]
for i in range(num_media_comp):
colname = media_comp_vars[i]
factor_df[colname] = -(X2[colname] ** (beta2 if type(beta2) == float else beta2[i]))
factor_df['intercept'] = np.exp(tau)
# 2. calculate the product of all factors -> y_pred
# baseline = intercept * control factor = e^tau * X[13]^beta[13]
# .apply(np.prod, axis = 1. Multiplication for every row)
y_pred = factor_df.drop(mdip_comp_cols, axis =1).apply(np.prod, axis=1)
factor_df['y_pred'], factor_df['y_true2'] = y_pred, y_true2
factor_df['baseline'] = factor_df[['intercept']+ctrl_vars].apply(np.prod, axis=1)
# 3. calculate each media factor's contribution
# media contribution = total volume – volume upon removal of the media factor
mc_df = pd.DataFrame(columns=media_vars+media_comp_vars+['baseline'])
for col in (media_vars):
mc_df[col] = factor_df['y_true2'] - factor_df['y_true2']/factor_df[col]
for col in (media_comp_vars):
mc_df[col] = factor_df['y_true2'] + factor_df['y_true2']/factor_df[col]
mc_df['baseline'] = factor_df['baseline']
mc_df['y_true2'] = factor_df['y_true2']
# 4. scale contribution
# difference between mc_true(y_true2 - baseline) and mc_pred. This difference
# used to regauge/scale/proportion w.r.t. each media factors
# predicted total media contribution: product of all media factors
mc_df['mc_pred'] = mc_df[media_vars + media_comp_vars].apply(np.sum, axis=1)
# true total media contribution: total volume - baseline
mc_df['mc_true'] = mc_df['y_true2'] - mc_df['baseline']
# 5. scale mc_df based on original sales
mc_df['total_volume'] = original_sales
for col in media_vars+media_comp_vars+['baseline']:
mc_df[col] = mc_df[col]*mc_df['total_volume']/mc_df['y_true2']
print('rmse (log-log model): ',
(mean_squared_error(y_true2, y_pred, squared = False))/(y_true2.mean())*100)
print('mape (multiplicative model): ',
mean_absolute_percentage_error(y_true2, y_pred))
return mc_df
###calculate media contribution percentage
def calc_media_contrib_pct(mc_df, media_vars=mdip_cols, media_comp_vars = mdip_comp_cols, sales_col='total_volume', period=52):
'''
returns:
mc_pct: percentage over total sales
mc_pct2: percentage over incremental sales (sales contributed by media channels)
'''
mc_pct = {}
mc_pct2 = {}
s = 0
if period is None:
for col in (media_vars+media_comp_vars+['baseline']):
mc_pct[col] = (mc_df[col]/mc_df[sales_col]).mean()
else:
for col in (media_vars+media_comp_vars+['baseline']):
mc_pct[col] = (mc_df[col]/mc_df[sales_col])[-period:].mean()
for m in media_vars:
s += mc_pct[m]
for m in (media_vars+media_comp_vars):
mc_pct2[m] = mc_pct[m]/s
return mc_pct, mc_pct2
As always thank you for your help, hoping these modifications can be helpful or at least interesting
I like the idea of adding competitor effect!
Before diving into the code, I have some questions.
- Definition of your media competitor variables, are they media impression/spend of your competitors?
- Above all, the media competitor variables, are they control variables or media variables?
Often competing strength/competition score is a control variable in MMM, media variables only include your own channels. So I tend to believe they are control variables.
If you consider them as media variable:
In multiplicative model, media effects are multiplicative, media channels act synergically. It's okay to assume your own channels work together, but for your competitors' channels, are they working so closely with your own channels? Do they have equal importance as your own channels on your sales?
How to calculate the y_pred considering also the effect of the media competitors?
No need to do this:
for i in range(num_media_comp):
colname = media_comp_vars[i]
factor_df[colname] = -(X2[colname] ** (beta2 if type(beta2) == float else beta2[i]))
y_pred = factor_df.drop(mdip_comp_cols, axis =1).apply(np.prod, axis=1)
Your mdip_comp factors are not negative, 100^(-0.05) is a positive number.
If that was the only obstacle (now resolved), I think the mmm_decompose_contrib() function still works for you.
Happy that you like the idea!
I will better specify the variables I used.
In the First Model, I added the competitor promo as negative (X3): this is for helping to calculate the base_sales.
In the Second Model, I don't get why should I to consider them positive?
I am using media metrics of the competitors, and I am doing this for two main reasons:
- As my media, also their media (the competitor ones) have to behave to a proprer adstock rules.
- Because my assumption is that they have negative contribution on my media effect (they can absorb part of my media effect).
So, do you suggest to leave the code as you wrote, but at the end, when I have all the data transformed, to put a negative sign to the transformed competitor media? This can help of course to calculate the y_pred and also the MAPE.
If it is like this, I am getting it ;)
My thoughts...
- As my media, also their media (the competitor ones) have to behave to a proprer adstock rules.
You can apply adstock transformation to them when they are control variables. - Because my assumption is that they have negative contribution on my media effect (they can absorb part of my media effect).
Being control variables, they also have negative effect on your sales.
I saw agencies/papers always use competition as control variable. I haven't dealt with competitors data (I did MMM in-house), so I feel not competent to give instructions on this. It's unconventional to use them as focal variables, I can't tell it's a good or bad practice, you're free to try. I'm interested in your result.
Computing media contribution
Here's a simplified example.
y = baseline * x1^b1 * x2^b2
y: sales
x1: your channel, b1 > 0
x2: competitor's channel, b2 < 0
baseline=1000
x1=500, b1=0.03, x1^b1=1.2
x2=100, b2=-0.01, x2^b2=0.95
y_pred = 1000 * 1.2 * 0.95 = 1140
contribution of your channel: c1 = 1140 - 1140/1.2 = 190
contribution of competitor's channel: c2 = 1140 - 1140/0.95 = -60
Suppose real sales y_true = 1100
Predicted total media contribution: mc_pred = c1 + c2 = 190 + (-60) = 130
True total media contribution: mc_true = 1100 - 1000 = 100
Scale:
c1 = 190 * (100/130) = 146.15
c2 = -60 * (100/130) = -46.15
About code:
for i in range(num_media_comp):
colname = media_comp_vars[i]
factor_df[colname] = -(X2[colname] ** (beta2 if type(beta2) == float else beta2[i])) # no need to flip the sign -> factor_df[colname] = (X2[colname] ** (beta2 if type(beta2) == float else beta2[i]))
y_pred = factor_df.drop(mdip_comp_cols, axis =1).apply(np.prod, axis=1) # no need to drop mdip_comp_cols
(Sorry I don't have time to follow chunks of code, I only looked at these problematic lines you highlighted.)
Hi Sibyl,
Thank you for your honesty.
You are definitely right, I made the modification you said and it works perfectly.
Now I have the correct MAPE, having the variables behave how I want.
I did this little modification, inside the mmm_decompose_contrib.
# 3. calculate each media factor's contribution
# media contribution = total volume – volume upon removal of the media factor
mc_df = pd.DataFrame(columns=media_vars+media_comp_vars+['baseline'])
for col in (media_vars):
mc_df[col] = factor_df['y_true2'] - factor_df['y_true2']/factor_df[col]
for col in (media_comp_vars):
mc_df[col] = factor_df['y_true2'] + factor_df['y_true2']/-factor_df[col]
mc_df['baseline'] = factor_df['baseline']
mc_df['y_true2'] = factor_df['y_true2']
Really thank you for your suggestions and help!