jinglescode/time-series-forecasting-pytorch

It seems like predicted price is off 1 day on IBM or maybe i missunderstood something

deriklogov opened this issue · 9 comments

Hi,
I was very impressed with the result of prediction graph of IBM but than i noticed that actual prediction is going after next day of actual data, so i decide to print out some piece of data with this :

print("Actual/Predicted prices:") for i in range(len(data_close_price)): print(str(data_close_price[i]) + '\t ' + str(to_plot_data_y_train_pred[i]) + '\t' + str(to_plot_data_y_val_pred[i]))

and here is a piece of output ( i grabbed a piece where the stock went up to make it clear):
data_close_price to_plot_data_y_train_pred to_plot_data_y_val_pred
61.5541637158 62.48051591375582 None
63.1859297539 61.81105258680695 None
65.5087283847 63.22269362984531 None
65.2532205353 65.42579108472087 None
70.2646585813 65.24432286147622 None
69.5387840091 70.2044966634442 None

lets check 2nd row, date_close_price showing 63.1859 while prediction price is 61.8110
and then on next day data_close_price 65.508 and prediction 63.222 ,
so it seems that prediction price is one day behind ???
Please advice, maybe i dont understand it

to be clear, to me it looks like next day prediction price is very close to previous day actual price

Same problem. I don't get it.
Figure_4

I am just leaving a comment here to catch your solution. I am also working on this problem right now.

how did you go with this? I am looking at the same problem.

Ok, my idea is that the machine is cheating. It gives you back a value very close to the previous day, since it should be the closest one to the real value without doing prediction.
I used different networks and the result was usually the same. Doublechecked every variable, label, etc. That wasnt the problem.

Perhaps a little late, but...
Making price predictions based on past prices is a poorly feasible task (I think generally impossible). The first thing that stands out is that the predictions are perfect (but "shifted"). The data link is not good enough for the model to make such good predictions.
If you want at least some predictions based on previous courses, you can try the step not for a day, but for a week or a month. In such a case, a stable market can have a normal result (when was the market last stable?).
If you still want to analyse (whether for a day or for a week does not matter), then add at least a hundred economic parameters to the model. The problem with learning is not writing code (with libraries like PyTorch or TensorFlow it is not very difficult), but preparing the data. Alphavintage offers a handy API, but you need data, for example, from any statistical bureau. Be sure to watch all reports and the US economy (there are economic calendars, although APIs are paid for historical data almost everywhere), a little for China, Japan and the EU.
In general, machine learning is not a panacea. If you have several trillion parameters, then maybe it will. Right now you can add a hundred parameters and see if there is a prediction (it does not matter if it is inaccurate). I have not tried it, but you can use fingpt, for example, to analyze non-numeric data. Then give it your predicted numbers and listen to the advice for the next day.
UPD. Forgot to say. Do not analyze absolute values (even normalized ones). That is completely useless, in my opinion. Analyze the change in magnitude. For example: value_rel = (value_cur - value_prev)/value_prev

@egorpes You can also simplify (current - previous) / previous to current / previous - 1.0

Hi, I would like to know if someone can objectively and comprehensively explain the main reasons why the model returns the last value of the input sequence, and not the next "predicted" one. I see the following: during training, the model tries to converge to the predicted values and this is clearly visible, but when moving to validation, everything changes dramatically. For myself, I explain this by the backward method and the overall approach to the network, which gives the network the ability to cheat. It’s as if it’s worth considering the problem from the other side or a completely different learning method, for example, a classification task - to determine not a specific value, but a movement trend through q-learning. I’m very interested to see if anyone has any progress in this regard. I would also like to express my disagreement with the post above that this problem cannot be solved at all within the framework of machine learning. From my point of view, there is no need to predict a 100% match with the price; it is enough to be right in 50%+ cases. The news is perfectly reflected on the charts themselves in the form of signals, which allows you to stay within the technical limits. analysis. A good example is a human trader: he tries to recognize some patterns, patterns and predict the next trend, of course he does not do it perfectly, but the fact is that there are successful cases and they are not based on luck. The essence of using machine learning here is to minimize human errors as much as possible and make trading more or less stable in terms of forecasts. Of course, the task is quite difficult from the point of view of understanding and I think that relying on one neural network to solve it is somewhat stupid; perhaps it is worth backing it all up with a complex of interconnected networks and algorithms. Thank you for reading

From the perspective of the neural network:
Since the prediction of the next timepoint can be either lower, or higher than the present value, betting in either direction would give back larger loss value. The emphasis is on betting, because it sees no indicator that could be used to predict the change of price.

You can also translate this from another perspective. The model sees no value that indicates change in either direction. This means, value of the next price is dependent largely on the current price, and an external factor that is not included on the training dataset. So current price is the only one that can be used.

I am not great on explaining it, but trust me timescale analysis is not suitable for trading. It can be used to predict for example room temperature, if you know how long you turn on the heater. But in that case, you know the external factor that changes the room temperature.

I recommend classificatiom and reinforcement learning for price prediction. I've done this for a long time, i tried almost everything that can be tried. Nothing helps.