google/lifetime_value

Why the ltv prediction part use probabilitity prediction multiply expectation as the final ltv prediction?

RuiSUN1124 opened this issue · 2 comments

I also found it strange.

As far as I understood, regression part of the model is trained on the subset of customers who have observed nonzero LTV:

positive = tf.cast(labels > 0, tf.float32)

safe_labels = positive * labels + (
      1 - positive) * tf.keras.backend.ones_like(labels)

regression_loss = -tf.keras.backend.mean(
      positive * tfd.LogNormal(loc=loc, scale=scale).log_prob(safe_labels),
      axis=-1)`

If loc and scale give the best accuracy prediction on this subset of customers, then

preds = (positive_probs *
      tf.keras.backend.exp(loc + 0.5 * tf.keras.backend.square(scale)))

gives shifted estimation in general case, since positive_probs are not 0 or 1, but somewhere between them.

I think probability estimated by the classification part of the model should somehow be taken in consideration by the regression part of the model.

It actually makes perfect sense if you think about what the intention of a zero-inflated log normal method is.

Imagine a simple case where a customer has an LTV of either 0$ with 99% probability, or has an LTV of exactly 100$ otherwise.

When we use a zero-inflated method for LTV, we are estimating the probability mass of zero LTV customers (classification) and we are estimating the conditional expected LTV for the non-zero LTV customers.

So in the case above, our perfect model would estimate the customer has a 1% chance of having a non-zero LTV and if they are non-zero LTV then we estimate their LTV EV to be 100$.

But if we just take the regression output then we would say the expected LTV of our customers is 100$ but this is clearly not true. We have to multiply the probability of the customer being non-zero by their expected LTV conditioned on them being non-zero.

If we assume that y is non-negative, then we can see that:

E(y) = P(y > 0) * E(y | y > 0) + P(y = 0) * E(y | y = 0)
E(y) = P(y > 0) * E(y | y > 0) + P(y = 0) * (0)
E(y) = P(y > 0) * E(y | y > 0)

Our model is essentially estimating P(y > 0) with the classification output and it is estimating E(y | y > 0) with the regression output.

So that is why we multiply the probability of non-zero LTV with the conditional customer expected LTV to get the true customer expected LTV that we care about which is E(y)