lucasrea/StockForecast

some questions

Opened this issue · 1 comments

Hello, first off thanks a lot for this project, I learned a lot. Sorry if this isn't an appropriate place to put this but I have a question I was hoping you could answer.

In your cross-validation method, execution is [29] in the notebook, I was wondering whether each pass of the while loop will create a new model, or whether it builds on previous passes of the data. For example, you have these lines:

X_train, X_test, y_train, y_test = train_test_split(X, y, train_size= 7 * len(X) // 10,shuffle=False)
        
        rf_model = _train_random_forest(X_train, y_train, X_test, y_test)
        knn_model = _train_KNN(X_train, y_train, X_test, y_test)
        ensemble_model = _ensemble_model(rf_model, knn_model, X_train, y_train, X_test, y_test)
        
        rf_prediction = rf_model.predict(X_test)
        knn_prediction = knn_model.predict(X_test)
        ensemble_prediction = ensemble_model.predict(X_test)

is rf_model, knn_model, and ensemble_model being trained on nothing but the 40 or so pieces of data we specified earlier? Or is it somehow remembering certain things from previous passes? In other words was the 70% ish accuracy that you attained. based on training a singular model over many passes and then taking its average accuracy, or training many models and then averaging their accuracies?

I'm asking this all because I want to make predictions using that model but am not sure how. In your medium post you did something like

del(live_pred_data['close'])
prediction = ensemble_model.predict(live_pred_data)
print(prediction)

But I'm not sure how to do this since the ensemble_model is only defined inside of the while loop, so I can't access it in another cell. Did you evaluate the predictions inside of the while loop? If not where did this model variable come from?

Sorry for the long post, and thanks again

Hi,

Thanks for your comments and questions.

After testing I have found that defining models at the beginning of the while loop (effectively creating a new model to be trained at every iteration) versus keeping the same model produce very similar results (maybe for a different stock this may not be the case?). However, because I think it makes more sense to have one model (and not create a new one every iteration) I will use your suggestion and modify the code.

Also, yes I believe that the last part was to validate the model on future days not seen during the training process.

Thanks again