My attempt to scrape stock data, analyze the market, ... and tradebot
Python 3.5+
( 2020.06.07 ) For those who already used Redtide before.
- Thank you!
- I pretty much rebuilt it from the ground up and SQUASHED ALL the commits. Pull with caution!
- Live-quote option is deprecated! It sucked, pretty sure no one
used it. Pulling live-quote still exists as a functionality of the new
Stock
data model. More on this below. - Streamlined the stock symbol compilation step by automatically logging into eoddata.com (with your auth using Selenium)
- Scrape company's financial data during the symbol compilation step
- Introducing HoodAPI that uses Selenium to automate trades on
Robinhood, example:
from src.api import HoodAPI hood = HoodAPI() hood.make_order('buy', 'AMD', shares=5, order_type='limit', price=52) hood.order_status('AMD')
- An experimental trade bot option (
--bot
), but go through it carefully first (at least read the brief description below). Run this after compiling symbols and financial data:$ python3 redtide.py --bot --budget 500 --stocks 5 --maxloss 0.9
- The
tzset
issue on Windows is resolved. Everything should work on Windows. I tested on Windows 10. - All that ML stuff is removed because they are utterly useless! I honestly believe that everyone is modeling it better than I am. T_T
If you are looking at this, chances are you were on Reddit. This is currently documented very very poorly. You have been warned. Hopefully, you just want a simple way to grab historic data (end-of-day data) from Yahoo Finance to do your own awesome analysis. Or if you want to venture into automated day-trade. Then Redtide may be a solution for you. For now, I'm a bit too busy and stressed out by work to document anything thoroughly, so... have fun!
- Just clone this repository. You'll want to do everything
in the redtide/ folder (I'll make a wheel at some point)
- I highly recommend creating and doing everything in a virtual
environment, i.e.
$ cd redtide/ $ python3 -m venv . $ source bin/activate
- I highly recommend creating and doing everything in a virtual
environment, i.e.
- Use the requirements.txt file to make sure you have everything you'll need. Run this:
$ pip3 install -r requirements.txt
If you are not familiar with Selenium, it is basically a we browser instance that you can control in Python. Generally used in website test automation, but I'm using it for difficult logins and navigating complex websites.
Skip this if you already have Firefox (gecko webdriver) in PATH. As in,
you know running webdriver.Firefox()
without any arguments will work.
Steps (assume you already installed the Python reqs):
- Download gecko webdriver from here (scroll down to Assets): https://github.com/mozilla/geckodriver/releases
- Unpack and put the executable anywhere you want it
- In
config.yaml
, point"path_geckodriver"
to the path of the gecko executable (the file you just unpacked), example:selenium: path_geckodriver: "where/you/put/geckodriver"
- eoddata.com auth
- This is so Redtide knows what stock symbols are out there
- Register a free account on eoddata.com
- Create a text file (whatever name you want, anywhere you want)
- In the text file, line 1: username, line 2: password; example:
admin password
- In config.yaml point the
auth_file
path undereoddata
to the path of the login file you just created - Alternatively, if you are concerned with security, you can just log in and download the symbol list of each exchange manually and put them in a folder named "listings" under the folder data/. You'll have to create the listings/ folder but Redtide will look in there first. But note that you'll want to repeat this everytime you pull data since stocks can get delisted and enlisted, and at the moment I don't know if there's a source for "diffs" on stock listings.
- Note: I don't work for eoddata.com, just find it to be a good source
- Robinhood auth (Optional, if you want to use HoodAPI)
- Same steps as eoddata.com auth, create a username-password file
but path pointed for
auth_file
underrobinhood
in config.yaml
- Same steps as eoddata.com auth, create a username-password file
but path pointed for
- "stock exchange files," which are just lists of stock symbols tell Redtide which symbols to look for data on Yahoo Fiance. Redtide check with Yahoo to make sure these symbols can be found and verify spelling (i.e. "-" or ".").
- While checking the symbols, Redtide also grabs the financial data of each company as it compiles the symbols.
- User gets a chance to examine the compiled list of stock symbols to make sure they are good to go, or go straight the pulling historical data by stacking command line options.
- Redtide pulls the End-of-Day movements all these stocks as far back as Yahoo provides.
- Navigate to the redtide/ folder and run the following command in a terminal.
$ python3 redtide.py -v -c
- Compilation can take an hour or so depending on the number of processors and your internet speed.
- When it's done, you'll see the new all_symbols.txt and excluded_symbols.txt under the redtide/data/ folder
Pro tip If you have a list of stocks that you care about, and don't
care about any other tickers. Then create your own all_symbols.txt
and keep it in redtide/data/ and skip this step (-c
) entirely
until you want to update their financial data.
- Navigate to redtide/ folder, and run:
$ python3 redtide.py -v -d
- This will about an hour or so. How long depends on the number of processors you have and your internet connection speed.
- When this is done, you will see a full_history/ folder under redtide/data/ that contains all the goodies (i.e. AMD.csv).
Pro tip You can chain both steps by doing:
$ python3 redtide.py -v -c -d
If you want to see some other options:
$ python3 redtide.py -h
At the moment, Robinhood does not have an official API. There are a few unofficial Robhinhood APIs but it seems that they are either not maintained anymore or "working with Robinhood LLC" to stay online. Meaning there's a dependency on Robinhood. This can also mean that they are more stable but also mean Robinhood can have control over it's access (i.e. daily call limits, is it profitable, etc.) So... HoodAPI is built using Selenium with the intention of having full control but not as reliable (i.e. if Robinhood changes layout, HoodAPI will need to be updated with how to navigate).
**Note: will need to setup Robinhood auth in the setup step.
There are 2 basic concepts:
- Action Making buy/sell order is a type of action. Cancel order is another type of action. Only market order and limit order for now, but the code to switch any order type is done, just not the form-filling part for the other types.
- Verify Status Check the status of the action you requested to make sure it was successful.
from src.api import HoodAPI
hood = HoodAPI()
hood.make_order('buy', 'AMD', shares=5)
status = hood.order_status('AMD')
If order went through then status['status']
would be "Done"
.
Otherwise, it can be another Robinhood status message, like
"Pending"
.
It is important to understand that order_status(s)
checks your
most recent order of that stock (i.e. the first element in the order
history for that stock).
If status is not "Done"
, then you can cancel it with
hood.cancel_order('AMD')
Here also, cancel_order(s)
only tries to cancel your most recent order
of that stock.
When you are done, call quit()
to exit safely. This is to prevent a
lingering Firefox session in the background.
hood.quit()
Must be very mindful of the current caveat of check_status(s)
and
cancel_order(s)
as you implement, which, again, is that they only
act on your most recent order of the specified stock
The goal here is to track stock prices, volumes, bids, asks, etc. of
each stock. A Stock('AMD')
object would retrieve and store the latest
data on AMD from Yahoo when you call .update()
. A Stocks([...])
object lets you track multiple tickers via multiprocessing.
A Stock() object can be instantiated for any symbol found on
Yahoo.
When auto_update=True
(default), then update is automatically
done as the Stock object is instantiated. Thus, you can check the
fields like price and volume. auto_update=True
would also enforce
update when certain fields are called, but to prevent update being
too frequence, a staleness limit on the data (i.e. 15 seconds default)
is used so that update is only performed if data is stale.
from src.models import Stock
from time import sleep
s = Stock('AMD', cache_stale_sec=15)
print(s.price)
print(s.volume)
print(s.bid)
print(s.bid_size)
sleep(15)
print(s.price) # update before returning price
A bunch of metrics are track during each .update()
call, but not
all of them are as useful as other, therefore, don't have a dedicated
field for them. But all data can be access under s.metrics
dict. Any
of the following metrics that also have a dedicated field would also
perform auto-update if auto_update=True
.
Yahoo metrics | Stock obj metrics | has field |
---|---|---|
regularMarketPrice | price | yes |
regularMarketVolume | volume | yes |
bid | bid | yes |
bidSize | bid_size | yes |
ask | ask | yes |
askSize | ask_size | yes |
currency | currency | yes |
regularMarketChange | change | no |
regularMarketDayHigh | high | no |
regularMarketDayLow | low | no |
sharesOutstanding | shares | no |
marketCap | market_cap | no |
regularMarketPreviousClose | previous_close | no |
regularMarketOpen | open | no |
averageVolume10days | volumne10days | no |
During each update(), these metrics are also written to `data/live_quotes/ (default path) as lines of JSON (note: not a JSON file!) Each line in the file can be interpreted with a JSON interpreter. The sole reason that this is so weird is because all this is still quite experimental, and I want a structure that's as easily accessible as it is dynamic.
s.price_trend(k=5) returns a Scipy linregress result object on the
linear regression fit of the last 5 price data points (5 update()
calls are required)
s.volume_trend(k=5) returns a Scipy linregress result object on the
linear regression fit of the last 5 volume data points (5 update()
calls are required)
Stocks() object manages multiple Stock()
objects. Example:
from src.models import Stocks
from time import sleep
ss = Stocks(['AMD', 'AAPL', 'TSLA']).update()
ss.add_symbols(['UAL', 'DAL', 'AAL'])
ss.remove('TSLA')
# update with 5 sec intervals
for _ in range(5):
sleep(5)
ss.update()
# ask "what is the price trend of all the stocks I'm tracking?"
# this does a T-test with the R^2 values of all the stocks against
# zeros. Not the best way to do this statistics, but less data
# required to get a rough estimate.
ss.price_trend(k=5, metric='rvalue')
It's so experimental, I haven't made a single cent with it yet! Trading works, but I'm new to day-trade and I'm not sure what's a good strategy here. The trade bot basically utilizes the HoodAPI to automate trades. Feel free to look into to the code src/tradebot.py and I sincerely hope you are more successful than I am.
To run with default ($500 budget, at at most 5 stocks, 90% budget max loss)
$ python3 redtide.py --bot
Or more controls
$ python3 redtide.py --bot --budget 1000 --stocks 10 --maxloss 0.95
May the Force be with you... s
I want to exploit this great resource via crowd sourcing. They limit free users to 5 requests per minute and 500 requests per day. There are about 8600 tickers in AMEX, NYSE, and NASDAQ. Their API lets user get intraday data up to a week so it'll take 18 users less than 2 hours to collect intraday data each EOD. Code for this is in progress. There are much more we can do by joining force. Let me know if this is interesting to you. :)
- u/siem found a problem with how Macs handles forking. His full comment:
"I had problems running the code at first on my Mac ("python +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.") - supposedly Apple has changed their fork implementation to disallow forking with active threads. When I ran this before running the code I didn't get errors:
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES
"
- u/siem also found that after 2000-3000 pulls, connection to Yahoo Finance could fail. Possibly due to a tempoary IP ban. After that, I also noticed that around 120 fast page crawls, there's a temporary IP ban. To circumvent this issue, I implemented pauses around 10 - 20 seconds for every 100 page loads. Also, did the same for compiling the symbols, except the pauses are 5 - 10 seconds for every 200 page loads.
- New: If connection fails or you get banned temporary, it will try to fetch for the failed ones again (maximum of 5 passes). If after 5 passes , there are still failed symbols left, they will be written to a failed_symbs-<5 random characters>.txt file. And you'll see a suggestion to run something like the following to retry.
# For failed daily history fetches
$ python3 redtide.py -v -d --file failed_symbs-abe93.txt
# For failed symbols during symbol compile
$ python3 redtide.py -v -c --file data/excluded_symbols.txt
- Helpful Redditor u/siem discovered ways to resolve forking issue on Mac and IP ban issue with Yahoo.
- Big thanks to John Ecker (JECKER@rollins.edu) and Lukáš Vokráčko (vokracko) for help make Redtide better with debugging and better documentations! Much appreciated!
Gosh... Where do I even begin... email me if you are interested.
Let me know what you think, and if you want to help out on my odd endeavor to be less poor. I don't respond very quickly, but I always try to respond: jiunyyen@gmail.com