In this module, I learned how to use Pandas and the JupyterLab IDE to collect, prepare and analyze financial data.
For this Challenge, I’ll assume the role of a financial advisor at one of the top five financial advisory firms in the world. My firm constantly competes with the other major firms to manage and automatically trade assets in a highly dynamic environment. In recent years, my firm has heavily profited by using computer algorithms that can buy and sell faster than human traders. The speed of these transactions gave my firm a competitive advantage early on. But, people still need to specifically program these systems, which limits their ability to adapt to new data. I am thus planning to improve the existing algorithmic trading systems and maintain the firm’s competitive advantage in the market. To do so, I’ll enhance the existing trading signals with machine learning algorithms that can adapt to new data. I’ll combine your new algorithmic trading skills with your existing skills in financial Python programming and machine learning to create an algorithmic trading bot that learns and adapts to new data and evolving markets.
- Establish a Baseline Performance
- Tune the Baseline Trading Algorithm
- Evaluate a New Machine Learning Classifier
- Create an Evaluation Report
- Run through each line of code to view the output
This challenge assignment encompasses two distinct tasks:
- Create a good trading strategy (i.e., good trading signals) that outperforms the buy-and-hold approach.
- Create a predictive model that's capable of forecasting the trading signals that implement the strategy.
The tasks are distinct because it's possible to do one of them without doing the other. A good trading strategy will backtest better than buying and holding. A good model will faithfully predict the signals that are use to implement the trading strategy. If the trading strategy is good but the model is bad, then the signals to buy/sell are appropriate, but the model isn't predicting the signals very well. If the trading strategy is bad but the model is good, then the signals to buy/sell aren't appropriate, but the model will predict the signals with a high degree of accuracy. That distinction is important because the trading strategy and the predictive model have to be evaluated independently of each other.
The efficacy of the trading strategy is determined by the financial performance of the Strategy Returns (depicted in most of the plots below with an orange line) relative to the Actual Returns (depicted in most of the plots below with a blue line). The Actual Returns represent the buy-and-hold strategy of purchasing the exchange-traded fund (ETF) and selling the shares at a later date. The Strategy Returns represent a combination of buying and selling short the ETF based on the trading signals we've created in the dataset.
From the confusion matrix and classification report, it's clear that the predictive model nearly always predicts a buy signal. The model only generated 161 sell signals out of a total of 4092 opportunities (3.93%), even though there were 1804 instances (44.09%) of actual sell signals in the test data set. Because the model nearly always suggests buying, one would expect that the model performance greatly resembles a buy-and-hold strategy:
The performance of the buy-and-hold strategy is given by the blue line on the plot (Actual Returns). However, the Strategy Returns, as defined by the starter code, look nothing like the buy and hold approach. That's because the trading signals for the Strategy Returns aren't being generated by the model - they're being generated by the data that's testing the model. Obviously the Strategy Returns don't perform very well, but that's just because the approach for creating the signals isn't very good.
In summary, the blue line is the buy-and-hold value of the ETF over time. The orange line is the cumulative value of the trading approach we're trying to implement.
It turns out that the trading strategy in the baseline trading algorithm greatly underperforms the buy-and-hold value of the ETF.
Case 2 is identical to the baseline trading algorithm (Case 1), except the classifier is a logistic regression model instead of an SVC model. The results are as follows:
Unlike the baseline trading algorithm, this model predicts far more sell signals. The baseline trading algorithm only predicted 3.93% (161/4092) sell signals, so it ultimately performed like a buy-and hold trading strategy. This model, with the logistic regression classifiers, predicts 33.58% (1374/4092) sell signals, so the model is behaving very differently, even if its accuracy is marginally worse than the baseline trading algorithm. Practically, the difference between the logisitic regression classifiers and the SVC classifiers is that the misclassifications are far more balanced using logisitic regression:
-
Logisitic Regression:
- Accuracy: 52%
- Errors:
- 1201 FP
- 771 FN
-
SVC (Case 1):
- Accuracy: 55%
- Errors:
- 1735 FP
- 92 FN
That outcome is easy to see in the financial performance plot:
On the above plot, the SVC Model Returns (green line) are simply the Model Returns from the appropriate plot of comparison, which in this case is the baseline trading algorithm. The LR Model Returns are the Model Returns using the logisitic regression classifiers instead of the SVCs.
Even though the accuracies of the logisitic regression model and the SVC model are quite close to each other, their financial performance is very different. The baseline trading algorithm (SVC Model Returns) largely resembled a buy-and-hold approach, whereas the LR Model diverges from the buy-and-hold approach much more frequently. Once again, the gains and losses are largely coincidental, because the SVC and LR Model Returns are trying (and failing) to replicate the Strategy Returns (orange line). But from this exercise, it's easy to see how changing only the classification model can greatly affect the model's performance, even when the models are trained and tested on the same underlying data.
The trading strategy's success hinges not only on the predictive model's accuracy but also on its ability to generate balanced and accurate trading signals. While accuracy is crucial, the model's adaptability to market conditions and its ability to generate actionable signals are equally important. The evaluation underscores the significance of considering both components independently to gauge the overall effectiveness of algorithmic trading strategies. In this context, the logistic regression model shows promise for improving trading strategy performance by providing more balanced and actionable signals compared to the baseline approach. However, further refinement and testing may be necessary to optimize both the predictive model and the trading strategy for real-world application.