shokru/mlfactor.github.io

Chp03.Rmd line 124: error with the return lag

Closed this issue · 6 comments

data_FM <- left_join(data_ml %>%                                    # Join the 2 datasets
                         dplyr::select(date, stock_id, R1M_Usd) %>% # (with returns...
                         filter(stock_id %in% stock_ids_short),     # ... over some stocks)
                     FF_factors, 
                     by = "date") %>% 
    mutate(R1M_Usd = lag(R1M_Usd)) %>%                              # Lag returns
    na.omit() %>%                                                   # Remove missing points
    spread(key = stock_id, value = R1M_Usd)

In the above code block (Chapter 03.Rmd), we should groupby stock_id first before lagging the returns.

In the current code setup, stock 1's return in 2018-12 will be shifted into stock 3's return in 2000-01 (which can be observed in the data_FM dataframe).

Betas look somewhat different.

Regression result with groupby(stock_id):
betas_w_groupby

Regression result without groupby(stock_id):
betas_wo_groupby

Another question related to this is: why we lag the return?

df

In the above screenshot, -0.036 is stock 1's return in Jan. 2000. If we lag the return, then, when running regressions, we are essentially using Feb. 2000's factors to predict Jan. 2000's return.

Shouldn't we build a model that can use factors available at t to predict stock returns for the next month (t+1)?

Thanks!

You are correct for the first remark. Indeed the data should be group before lagging.
Luckily, it only affects a small portion of returns, which indeed shifts betas, but only marginally.
I will correct this in the next version, which I will release asap (maybe this weekend).

For your last point, well it's an open question in fact. It depends what you want to do.
In the original 1973 paper, the regressions are not predictive (returns & loadings are synchronous), so the purpose is to explain.
But indeed, you could very much use the forward returns, in which case you would predict.
Fama-Macbeth is used to compute so-called "market prices of risk" (or risk premia) associated to factors.
Personally, I don't use it as forecasting tool...

Thanks for the correction!

For your last point, well it's an open question in fact. It depends what you want to do.
In the original 1973 paper, the regressions are not predictive (returns & loadings are synchronous), so the purpose is to explain.
But indeed, you could very much use the forward returns, in which case you would predict.
Fama-Macbeth is used to compute so-called "market prices of risk" (or risk premia) associated to factors.
Personally, I don't use it as forecasting tool...

Got it. Thank you very much for the quick reply.

So, just to clarify, the reason we lag the return is because: we want to use t+1's factors to explain t's returns.

Do I understand the interpretation correctly?

Thanks!

In this case, since the R1M_Usd are the 1 month future returns, lagging shifts them in the past, so they become synchronous (and no longer predictive).
We are implementing the original version of the estimation. But many variations have blossomed since them, and there is no one dominant paradigm (I think).

In this case, since the R1M_Usd are the 1 month future returns, lagging shifts them in the past, so they become synchronous (and no longer predictive).
We are implementing the original version of the estimation. But many variations have blossomed since them, and there is no one dominant paradigm (I think).

Got it. Thank you and thanks for the great book.