The study aims to develop a model using machine learning methods to analyze stock trading risks and make informed decisions on whether to stay in the market or exit, essentially determining whether to buy or sell stocks.
I have used data from Yahoo MSFT stock (Microsoft Corporation) from 1986/03/14 to 2022/12/20.
Date | Open | High | Low | Close | Adj Close | Volume |
---|---|---|---|---|---|---|
1986-03-14 | 0.097222 | 0.102431 | 0.097222 | 0.100694 | 0.062980 | 308160000 |
1986-03-17 | 0.100694 | 0.103299 | 0.100694 | 0.102431 | 0.064067 | 133171200 |
1986-03-18 | 0.102431 | 0.103299 | 0.098958 | 0.099826 | 0.062437 | 67766400 |
1986-03-19 | 0.099826 | 0.100694 | 0.097222 | 0.098090 | 0.061351 | 47894400 |
1986-03-20 | 0.098090 | 0.098090 | 0.094618 | 0.095486 | 0.059723 | 58435200 |
.... | .... | .... | .... | .... | .... | .... |
2022-12-13 | 261.690002 | 263.920013 | 253.070007 | 256.920013 | 256.920013 | 42196900 |
2022-12-14 | 257.130005 | 262.589996 | 254.309998 | 257.220001 | 257.220001 | 35410900 |
2022-12-15 | 253.720001 | 254.199997 | 247.339996 | 249.009995 | 249.009995 | 35560400 |
2022-12-16 | 248.550003 | 249.839996 | 243.509995 | 244.690002 | 244.690002 | 86088100 |
2022-12-19 | 244.860001 | 245.210007 | 238.710007 | 240.449997 | 240.449997 | 29668800 |
This dataset contains a date column that is organized and consecutive, we can consider this dataset as a time series dataset.
The Open Price
The Open price represents the price at which a stock was first traded during the current trading session.
The Close Price
The Close price represents the price at which a stock was last traded during the current trading session.
The High Price
The High price represents the highest price at which a stock was traded during the current trading session.
The Low Price
The Low price represents the lowest price at which a stock was traded during the current trading session.
The Open and Close prices give an idea of the general market trend for the stock in question.
- If the Close price is higher than the Open price, it is likely that the stock experienced a price increase during the trading session, indicating a bullish trend.
- If the Close price is lower than the Open price, it is likely that the stock experienced a price decrease during the trading session, indicating a bearish trend.
- The High and Low prices give an idea of the market volatility for the stock in question. If the spreadbetween the High and Low prices is large, it indicates that the stock experienced high volatility during the trading session. If the spread is small, it indicates that the stock experienced low volatility.
Our analysis is (monthly-based & Daily-based), and all the decisions are made the first trading day of the month. For a reason which will be clarified by the following code, our analysis will start from 24 months after January 1986 and end the month before November 2022.
Then I selected the columns to use for the candlestick chart ("Open", "High", "Low", "Close")
So if we return to our dataset and represent it with the candlestick chart, we will see variations over time in Mirosoft's actions. Our objective is to predict whether we will leave or stay in the market at the start of each trading period.
In the image bellow, I create another column 'Target' that specify if the action in the current trading session was increased or decreased ?. so i calculate the difference between the close price and open price then if it's positive it was a Bullish trend = 1 or Bearish trend = 0.
This target is made by the current period but we want to predict for the next period, will it be bullish or bearish?
So for this we had to shift all these values up so that each period will have a new target value which says that the next trading period will be bullish or bearish.
Next Step i train Machine learning and deep learning models to predict for new data if we gonna stay on the market or not.
Using Random Forest ...
Using Random Forest + GridSearchCV...
we notice that the results were improved ...
Using Random Forest + GridSearchCV + Feature extraction
I Calculated the logarithmic difference between consecutive prices ...
I used logarithmic differencing to normalize data, a common technique in financial analysis to visualize price variations in percentage terms using relative values.
Logarithmic differencing is useful for data with increasing trends over time, such as stock price data.
It involves taking the natural logarithm of each price value, then calculating the difference between consecutive values to compute relative growth rates between periods. This method helps visualize stock price growth trends for better understanding of price changes, even with significant long-term increases. It enables easy comparison of growth trends between different stocks or periods.
then i have normilized data ...
Using again Random Forest + GridSearchCV for pre-precessed data...
Then i deploy the model to TELL US ... WHETHER STAY OR EXIT THE MARKET :)