cvxgrp/cvxportfolio

MultiPeriodOptimization Behavior Documentation

Opened this issue · 15 comments

With user provided returns forecast, with a single forecasted return per timestamp- how does the model use the forecast?

Is it looking at the next planning_horizon bars of forecast on that day (lookahead-biasing if you generated the forecast with data upto that timestamp); or is it persisting the same forecast for planning_horizon additional bars of optimization.

I've read the docs and paper, but the way it is implemented is very unclear.

EDIT: I hadn't read your comment well. If you do nothing it simply carries over the value for the day (no look-ahead).

Hello, I also need to improve the docs on this. There are two ways to do it. One is to build a MPO policy by providing a list of objectives and a list of lists of constraints:

mpo_policy = cvx.MultiPeriodOptimization(
    objective = [cvx.ReturnsForecast(signal_today_dataframe) - gamma_risk * cvx.FullCovariance() - ...,
                cvx.ReturnsForecast(signal_tomorrow_dataframe) - gamma_risk * cvx.FullCovariance() - ...]
    constraints = [[cvx.LeverageLimit(3)],
                [cvx.LeverageLimit(3)]]
)

so the planning_horizon argument is unused. In this way you can change all the terms you want for each MPO step.

The other, in case you simply have slow and fast signals, is to use the decay argument of ReturnsForecast

mpo_policy = cvx.MultiPeriodOptimization(
    objective = cvx.ReturnsForecast(fast_signal_dataframe, decay = .2)
              + cvx.ReturnsForecast(slow_signal_dataframe, decay = .8)
              - gamma_risk * cvx.FullCovariance() , 
    constraints = [cvx.LeverageLimit(3)],
    planning_horizon = 5,
)

Let me know if this answers your question.

Thank you that clarifies perfectly. Very helpful.

I need to improve the docs for this; it might take a while so at least if someone comes here to ask something similar they find this.

One more question about the indexing of signal_today_dataframe and signal_tomorrow_dataframe in

mpo_policy = cvx.MultiPeriodOptimization(
    objective = [cvx.ReturnsForecast(signal_today_dataframe) - gamma_risk * cvx.FullCovariance() - ...,
                cvx.ReturnsForecast(signal_tomorrow_dataframe) - gamma_risk * cvx.FullCovariance() - ...]
    constraints = [[cvx.LeverageLimit(3)],
                [cvx.LeverageLimit(3)]]
)

Should the index be the dates of the prediction or the date it is predicting?
Ie lets say I have a dataframe with:
t symbol pred_{t+1} pred_{t+2} pred_{t+3}

Is signal_tomorrow_dataframe going to be
t pred_{t+2} (so the index is the date the prediction was made)
or should it be:
t+1 pred_{t+2}
Or even
t pred_t

The timestamp refers to the time of each period in the back-test, say 9:30am EST on a Monday. Then signal today, at timestamp 9:30am Monday, is the prediction of the return from 9:30am Monday to 9:30am Tuesday, signal tomorrow, at timestamp 9:30am Monday, is prediction of return from Tuesday to Wednesday, .... The time convention is the one defined in the paper, section 2. In practice you can assume that signal for today is built knowing all data up to the open price of today (and the open-to-open total return from yesterday open to today open). Does it make sense? I should definitely make these things clearer in the docs. The policy objects receive a view (past_returns) of the open-to-open total returns up to the open of the day, as a dataframe. ReturnsForecast without arguments simply computes a .mean() of that, so each day it does the full mean of all past returns for each name. In the user provided forecasters example you see how you can use the same model to do arbitrary forecasting.

so if I have a dataframe of signal_day_after_tomorrow (indices are the date the signals are created) I should shift it forward by two when I feed it into the objective?

I think in your formalism (comment before) it's t+1 pred t+1 for signal today, and so on. The timestamp in the signal dataframe is such that the prediction at that timestamp is done using all data up to the price at that timestamp. For signals for the future it's the same, but you predict a future quantity.

I'm still not 100% sure, but it sounds like basically if I got signal1 and and signal2 with index being the index outputted by a regression (so the date at which we predict target return of next day for signal1, and then that target shifted back by 1 for signal2); the correct way to use the optimizer would be something like:

mpo_policy = cvx.MultiPeriodOptimization(
    objective = [cvx.ReturnsForecast(signal1.shift(1)) - gamma_risk * cvx.FullCovariance() - ...,
                cvx.ReturnsForecast(signal2.shift(2)) - gamma_risk * cvx.FullCovariance() - ...]
    constraints = [[cvx.LeverageLimit(3)],
                [cvx.LeverageLimit(3)]]
)

I suppose what I'm confused about is whether it would instead be

mpo_policy = cvx.MultiPeriodOptimization(
    objective = [cvx.ReturnsForecast(signal1) - gamma_risk * cvx.FullCovariance() - ...,
                cvx.ReturnsForecast(signal2.shift(1)) - gamma_risk * cvx.FullCovariance() - ...]
    constraints = [[cvx.LeverageLimit(3)],
                [cvx.LeverageLimit(3)]]
)

You've got to think about the way data is consumed by your machine learning model that produces the signal. That's why you can take the user provided forecasters example as a starting point https://github.com/cvxgrp/cvxportfolio/blob/master/examples/user_provided_forecasters.py .

The line of your signal that has timestamp t, must be built with data that was available at time t. That's the case for all forecasted quantities: returns for the period, for the next period, volumes, risk model parameters, .... (If you don't have that property you're doing look-ahead and any analysis is invalid.)

An improved explanation of the multi-period optimization model was just merged in master, can be seen on the development version of the docs https://www.cvxportfolio.com/en/master/manual.html#multi-period-optimization

Hi @enzbus, I'm trying to generate my own MPO backtest example since the one in the docs is not yet complete. Currently I am getting a MissingTimesError that isn't caused by a timezone mismatch. In my case it is probably a misspecification between the ReturnsForecast dataframes and my user provided market data. In any case, do you have an estimate for when the example may be complete? I could work up an MRE if you think there could be a bug in the loop over trading calendar indices. Thanks

Does the section in the manual -> https://www.cvxportfolio.com/en/stable/manual.html#multi-period-optimization or this discussion-> #139 help? Generally what people have trouble with is making sure indexing is done by the time of execution (in the back-test sense), not the time of the prediction.

Yes I have read the full docs and the full paper. My forecast dataframes and market-data dataframes are both indexed by open timestamps. The forecast values corresponds to the forecasted return in the periods that follow while the market data returns dataframe contains the actual returns in the periods that follow.

For example, by my formalism with a planning period of 3, signal_today_dataframe has index open_t and the forecast corresponds to the forecasted return between open_t and open_t+1, signal_next_dataframe has index open_t and the forecast corresponds to the forecasted return between open_t+1 and open_t+2, and finally signal_next_next_dataframe has index open_t and the forecast corresponds to the forecasted return between open_t+2 and open_t+3.

Regarding market data, my returns dataframe also has index _open_t and the return value corresponds to the actual return between open_t and open_t+1 and its index (open_t) is identical to each forecast dataframe index. Is this structure the correct expectation per the API and is it true that all forecasted dataframes need to have an identical index to each market-data dataframe?

My desire for an example with data is simply that it could help clear up any ambiguity in the expectation of mutual structure between the forecasts and market-data but answers to the above should be enough to get me there eventually.

Thanks

Thanks for the clarifications, and yes that is indeed the correct format. I don't have an ETA for the complete restoration of the 2016 example notebooks, but if you paste a trace-back of the specific error you're getting I might help.

I will generate an MRE if I continue to have trouble. Thank you for the verification!