/SparkSummit2020

Sample Databricks notebook for the 2020 Databricks / Spark Summit

Primary LanguageJupyter Notebook

SparkSummit2020

In this presentation we propose a method for interpreting equity price action as a conversation between two participants, bulls and bears. We provide a way to interpret the conversation using classical technical analysis, and predict what will happen next using machine learning. Nothing contained here is investment advice. Always check with your registered financial advisor before making personal investment decisions. The best opportunities that exist for investors are the misperceptions of others. When investors as a group lose their way and misprice an instrument, another group of investors rush in to correct that mistake, they beat the market. Market participants consist broadly of two groups, bulls who insist prices will continue to rise and bears who insist that prices will continue to fall. From their perspectives, and from the moment, they are both right. There’s only one thing that obscures their vision, the hard right edge. That is the right side of the chart right now. That’s where history ends. For fundamental investors, the hard right edge is the limit of historical data about a company, its supply chain, its ability to acquire and retain talent, the news. There’s another way to discover misperceptions. By listening to and interpreting the conversations of market participants. This conversation and its lexicon has existed as long as the markets. Japanese candlestick charts go back to the 17th century, the same century as the Dutch tulip crisis. Today we call that language technical analysis. Just as deep learning has given us the ability to predict how to finish our sentences when we type in Gmail, we can use similar techniques to look ahead and predict how bulls and bears will finish their sentences in the market, based on their sentiment. Tactical analysis has been extremely useful over the years to predict short-term behavior, but it often misfires, and those misfires can be costly. We’ll combine a Databricks notebook with powerful built-in Amazon SageMaker algorithms to make those predictions. Let’s start with the basics. There are always only two sides to any trade. A decision was made to buy and sell by two parties.

Behavioral Economics

Sometimes prices move from lack of participation, that is buyers are not showing up. Therefore, only those most inclined to sell, or perhaps who have a directive to sell, can participate. Prices must fall low enough to find a bullish buyer. Of course the opposite is true when the sentiment is reversed. For any behavior to happen, three things need to occur at the exact same moment. There needs to be an ability to act. That could mean either having or needing liquidity. There must be motivation. That could be either need for capital or for investment returns. Importantly, no action will ever be made without a trigger, an event that says, “The time is now.” Technical analysis provides clear signs to what is behind these three essential dimensions of every trade. Rational market theory states that there is perfect information in the market, and that agents act logically with that information. The behavioral school takes a different approach, and “Extraordinary Popular Delusions and the Madness of Crowds” a book more than a 100 years old. The madness is the Dutch tulip buying mania. It’s the first use case of your rationality overcoming an otherwise rational market. You don’t need to be a psychologist to understand this. Some things are pretty obvious. People remember prices. If your favorite stock close to 20 consistently, and today you see it at 19, you think it’s a bargain. You’ll pick it up, you have the ability, you have the motivation. The lower price, whether justified or not, is the trigger. You’re ready to buy, and you act. To understand technical indicators, consider a pilot who can land her aircraft either with vision, or entirely based on her instrumentation. Using technical analysis to interpret group behavior, is very much like that pilot making an instrument only landing. Trendlines are linear equations that join highs to other highs and lows to other lows in a trend. That trend can be up, down, or sideways. What’s fascinating about trendlines is how simple and easily discerned they are. Remember a moment ago I said that people remember prices. This is reflected in a phenomenon where prices move in a channel for a period of time. Then there’s a breakout. A breakout could be on the up or downside. Soon prices will snap back to their former levels. People are habitual. There may be good reasons for the breakout, but it’s our nature to return to safety before stepping into the unknown. The hard right edge is the home of the unknown. Moving averages, especially front weighted exponential moving averages capture gradual changes in sentiment over time. Having worked on Wall Street for many years, I have some insights into block trading, large institutional behavior.

Support and Resistance

It’s much more difficult to buy a 100,000 shares of any company than a few 100. Institutional trends track over a longer period of time, there’s a wider band. Faster moving individual investors, versus institutional are relatively easily characterized by shorter and longer exponential moving averages. Finally, the granddaddy of all trend indicators, MACD is a highly effective gauge of trends strength. The oddly named stochastic is a hot trigger. It’s a common precursor to a trend, a change in the trend, but it frequently misfires. It rarely misses the reversal when the target is real.

Looking at this chart from left to right, on the bottom orange and the middle green lines, Amazon traded between 1685 and 1815 from September, 2019 to the end of that year. This pattern is called range-bound. It’s a sideways trend. Then something happened. There was a Christmas breakout. Breakouts are important, perhaps the most important market signal. They indicate that the conversation has changed. Where it ends, no one really knows. Every breakout is driven by the strong conviction of the dominant group, in this case bulls. The trend ends when the bull gets tired. The bears who happen to own Amazon as well take profits as good bears do, and they become the dominant group for a short time. However, their end point is a bit more predictable. It’s called a pullback. Frequently to a well known price as we see at the end of January. That’s how resistance becomes support. Switch the rolls around, and that’s how support becomes resistance. People are habitual. In the case of an uptrend, bulls have full control. Prices of the 500 largest companies by market capitalization are represented here by the ETF SPY. They were on an uptrend from September of 2019 to mid-February, 2020. Bears had their moments. You can think of the bouncing around this channel as bear is saying, “Excuse me,” but they could not become the dominant group. Then something happened in mid-February. Bears took over, bulls lost all conviction. Notice that at the end of that month, the bulls said, “Excuse me,” but it didn’t last long. We fell to 2018, 2018 does now support a position that was resistance when Trump entered office. Drawing straight lines on a chart seems like a strange way to describe a fluid conversation. That’s where EMAs come in. I found over the decades then we’re looking at the broad market. The two most effective EMAs are the 13 and 50 day EMAs. You can adjust these for other instruments. However, their meaning is unchanged. The 13 day EMA represents the speculators, day traders, and individual investors.

Exponential Moving Averages

They’re not in this conversation for the long haul. They just wanna see it move along. When it gets boring, either bullish or bearish, or if they’re punished for their misperceptions, they get out. The 50 day EMA is the conservative group, institutional investors, those seeking capital preservation and long-term gains. They too must assess their commitment daily, and adjust by becoming either bullish or bearish. When those lines cross, the conversation has changed. Think of a dog chasing a rabbit. It gets a hint and it runs. Initially the rabbit is fast, but the dog may catch up. The cross of the 13 and 50 day EMAs is a consistent sign that the conversation has changed. Let’s dive deeper into the anatomy of a trend. If EMAs are so powerful, then what would happen if we turned our entire attention to the distance between those two groups, the individual and institutional investors, or the rabbit and the dog?

Moving Average Convergence / Divergence

If the rabbit is continually escaping the dog day after day, then that dog may have no chance of catching her prey. This divergence of moving averages is a strong trend indicator, so strong that it overpowers the day-to-day thinking of the participants. It turns bulls into bears and vice versa, and as it’s continuing strength snowballs as it did in early March, until we hit a climax. Imagine you couldn’t see the prices of the instrument we were watching, we could only see MACD. Can you tell, looking at this chart, excuse me, when sentiment changed? Can you spot when the best time was to buy and sell? Now we’re flying by instruments only. When the fast line crosses the slow line, we have an opportunity to get ahead of the pack. In fact, that based on MACD alone, our first cross is a misfire. We shorted and had to cover quickly. However, second time was a charm and it was a doozy. MACD is built on historical prices, so it’s always late. In fact, it’s a conservative indicator of the conversation. What if we want to narrow in a bit and hit the exact peaks and troughs? Well, let’s mix a few more metaphors. Imagine two boxers entering a ring, it’s a well-matched fight, either fighter can win. However, they both have their limits. They will, if only for a moment, lose their momentum. They will need to recover their strength. That’s what stochastics measures. It’s what happens at the limits of exhaustion.

Stochastics

Stochastic is measured from zero to 100. It’s an average of the percentages to which prices close consistently, either at the higher low of the day. Over time, bulls and bears enter the exhaustion zone. Sometimes that’s called overbought or oversold. A cross in the top 20% of that range signifies a change. It means the bull is exhausted, it’s the bear’s time, and vice versa. In fact, stochastic frequently misfires. It tends to get pegged in a solid up or down trend. And that makes sense. The bull or bear has lots of energy, it doesn’t need a break. However, when it fires, it’s right. And it’s always right, right on time. Now that we’ve introduced a new vocabulary of group psychology, and how quantitative derivatives can give us powerful insight, we’re going to propose a novel strategy, similar to ensemble models.

Investment Objective

The first indicator will be the bullish or bearish cross of stochastic. We don’t mindlessly buy or sell based off of a single indicator. We look for confirmations. The second indicator may confirm that it’s time to act. If we’re conservative, we may wait for a third. We may sacrifice some profits to reduce our risk in the long run. The first mantra of every good wealth manager is capital preservation. This is great, but we’re left with a serious problem. We can’t see the future, nobody can. Using deep learning, we can make predictions with a degree of confidence, far beyond the coin toss of what’s coming next. With these predictions we can act with a greater degree of confidence than the lowly technical trader of former times. Let’s take a look at our options. The ARIMA algorithm is especially useful for data sets that can be mapped to stationary time series. The statistical properties of stationary time series, such as auto correlations are independent of time.

Timeseries Prediction

We can reformat our data to stationary eyes it, but trends are by nature not stationary, so we’re off to a bad start. NPTS generates predictions for each time series individually. This is getting better. The ETS algorithm is especially useful for datasets with seasonality and other prior assumptions about the data. ETS computes a weighted average over all observations in the input time series data set as its prediction. The waits are exponentially decreasing over time, rather than the constant waits and simple moving average methods. The waits are dependent on a constant parameter, which is known as a smoothing parameter. Facebook’s Prophet is especially useful for data sets that contain an extended period of time of detailed historical observations, have multiple strong seasonalities, include previously known important, but irregular events, have missing data points or large outliers, and have nonlinear growth trends that are approaching a limit. But all of the above are statistical methods that fall short of the power of deep learning. DeepAR is a supervised learning algorithm for forecasting one dimensional time series, using recurrent neural networks.

SageMaker DeepAR

Classical forecasting methods, such as ARIMA and ETS, fit a single model in each individual time series, and then use that model to extrapolate the time series in the future. In many applications however, you have many similar time series across a set of cross sectional units, just like stocks in a portfolio. In our case it would be beneficial to train a single model jointly over all of the time series, DeepAR takes this approach. When your dataset contains hundreds of time series features, the DeepAR algorithm outperforms ARIMA and ETS methods. You can also use the trained model for generating forecasts for new time series that are similar to the ones that it has been trained on. It’s commonly said that 70% of any stock’s movement is thanks to the broad indices, 20% to the industry, and just 10% to company news. DeepAR looks like it’s the right choice for predicting a stock portfolio. Now, Rumi will introduce the architecture we’ve proposed for this forecast. – Thank you, Chris. So here’s our reference architecture with Databricks Unified Analytics platform in the center, Amazon S3 and Amazon Kinesis are shown to be the data sources from where Databricks gets data. Databricks prepares the data to be useful, building a machine on your model. Once the training and test datasets are ready, you upload them to S3 bucket. You can upload the datasets and kick off the training all from that notebook running on the Databricks platform. As you can see in the diagram, Databricks can write data also to Amazon Redshift, which is a data warehouse service and Amazon RDS, relational database service, and Amazon DynamoDB, no sequel database, after you have done processing the raw data. You can also set the data for BI or reporting tool, straight from Databricks. Towards the end of last year we have launched our new service, AWS Data Exchange.