/stock_db_capture

Daily and Intraday OHLCV bars capture, fast message-driven parallel backtesting engine with trading strategy DSL, trading simulator, daily stock watcher

Primary LanguageRuby

The name of this project is really a misnomer. Initially this project
was concerned with finding a robust way to gather OHLCV bars on
trading days. There are several "free" data-sources available: Yahoo
finance, Google finance, and TDameritrade (which is not exactly free
-- you have to have an account). Since we did have a TDAmeritrade
account we used their data-source because we could not only get daily
bars but also intraday bars at 5,10, 15, and 30 minute
intervals. TDAmeritrade also had a "live" feed where we could easily
get snapshot information down to the minute. They also offer Level I
and Level II streams. This is where the "capture" term in the project
came from. The project grew from that very quickly into one where we
could design strategies, backtest them over the 10 years of daily bars
we had in the DB form which a collection of positions are generated,
the opening and closing of which was determined by the strategy being
used for backtesting purposes.

A collection of positions is not quite enough information to simulate
an actual trading strategy. A collection of Positions does not include
real-life limitations like the amount of money available to invest on
any given day, nor the investment meta-strategy where returns are
re-invested into the market as opposed to taking the excess returns
out in cash to produce an income. Doing so, obviously, does not allow
for any compounding of funds, but in real life that may be a requirement.

In the end the stock_db_capture was designed to:

1.Capture Daily and Intraday data on a regular basis (via cron running
  rake tasks). Currently daily and 30 minute intra day bars are captured.
  It should be also noted that for at least TDAmeritrades data-sources,
  it's possible for there to be "missing" bars, i.e. trading days for
  which a daily bar was not captured on their end and thus shows up
  as missing on our end. Fortunately, I've written have a very intellgent
  timeseries class which detects these "holes" and rejects the timeseries.
  I have written rake tasks which can fill these holes with Google or Yahoo bars,
  the union of which fills nearly (over 98%) of all holes. Aditionally,
  I detect splits by scrubing certain web pages the results of which are
  stored in a DB table. At present, nothing is done with this information
  to back propagate split information.

2.Support the design of trading strategies of arbitrary complexity and
  the backtesting of them.

3.From the collection of positions generated in #2, simulate the
  execution of trades using finite resources. Finite resources include
  the amount of cash on hand for that day AND the number of positions
  available that day. Depending on market conditions it is very possible
  to be either "cash starved" or "position starved" on any given day.
  Without this kind of simulation it would be impossible to know that.

4.Provide for a "stock watcher" which triggers entries of a list of
  stocks which met a certain criterion for opening a positon (assuming
  you can from a cash standpoint). Once openned the Position is then
  tracked daily and making it possible to know just how close the
  is to meeting the establish "closing" criteria. Of course the position
  can be closed (sold) at any time. The closing criteria is computed
  from the results of backtests and therefore show a certain optimum
  condition.

5.The results from backtest can also be fed into R to produce
  wonderful graphs which help "tune" the strategy. The R econometics
  packages are, in themselves, very sophisticated. One can to "quant-like"
  analysis in R.

6.Several months of "paper trading" using an RSI/RVI driven strategy
  produced favorable results but the evolving ROI was still victim to
  overall market sentiment.RSI = Relative Strength Index, RVI = Relative
  Volatility Index. Current strategys try to take advantage of volatility,
  and as such have relatively short hold times (appox. mean of 11 days).

The backtester was the facility which went though the most change
throughout the course of the project. Initially it relied upon a high-level
DSL which, while making strategies relatively easy to design, it came
at the price of limited expressiveness and complexity.

Once it was recognized that this was a serious limitation of the
backtester I went with a message passing architecture. This decision
was motivated by two forces that I saw acting on sophisticated backtests.
One, they were getting significantly more complex and, two the amount of
data upon which they had to crunch was going way up. The only way out of
this computational morass was to go parallel. The design had to scale
the number of cores available and the number of computers available on
a Gigabit Ethernet. Some form of distrubed message passing kernel seemed
the most appropriate. Additionally, this had become a well known
problem with a multitude of solutions and/or libraries or frameworks.
The ones tried so far with varying degrees of sucess were:

1.Rinda -- simple, reliable, and slow. The deal-breaker came when the
  Ring Finger was unable to locate a tuple-server anymore -- ever.

2.EventMachine + RabbitMQ -- relatively simple, advertised as
  reliable, but way too slow for our needs. The RabbitMQ server would
  become simply overwhelmed at a message publishing rate of less the
  100ms/message.

3.EventMachine + Beanstalk - fast, reliable but had some quicks, most
  likely due to the rather unsophisticated EM-Jack layer. The end result
  was lost messages which was intolerable. Backtests HAVE to be
  repeatable to be of any use whatsoever.

4.A synchronous version of the Beanstalk protocol resting upon
  beanstalk-client library which is a synchronous library. The proved
  stable yet under certain and consistent conditions it would lose the
  first 12 messages generated from the producer/consumers. Interestingly,
  I see this exact same pattern using EM-Jack. Since for the synchronous
  version I "borrowed" the EM.defer(op,cb) protocol for executing the bodies
  of sub-tasts withing a strategy it seemed likely that the problem followed
  EM.defer. I have been through every line of code, however, an I cannot
  see a problem. That plus the fact the EM.defer is in use in probably a
  hundred successful implementations using EventMachine the logic didn't
  seem to fly.

5.Which leads to the present. I do not have a message-based
  backtester that is robust. It is my hope that somebody who knows
  EventMachine much better than I will be able to suss this problem out.
  Trying to debug this using ruby-1.9.2 w/o a debugger proved to by a
  nightmare. Messaging systems by their very nature are difficult to
  trouble shoot as they have non-deterministic message patterns.
  What it looks like, as unlikely as it seems, is that defers complete,
  but the completion callback is not called. I have coded extremely
  hi-res and accurate instrumentation in the messaging code and the
  numbers consistently point out this 12 event pattern of completion
  callbacks not being called. Twelve is not a magic number: the
  The number of threads in the pool is 20.

To begin any serious work on this project you will need to populate your DB.
I am using MySQL and can provide dumps of all the relevant tables. Since
this is my first github project, I don't know th policy of upload huge
sql dump files.

Participation in this project does have a significant upside, you could, if
you choose use this software to make very intelligent trades. All trading
packages of which I am familiar only focus on 1 issue (stock) at a time or
at most a portfolio. This system scans, every trading day, over 7500 stocks
looking for buy and sell patterns.


Best Regards,

Kevin Nolan
kpnolan@comcast.net