/yahoo-symbols-archive

Yahoo Symbol Downloader

Primary LanguagePythonMIT LicenseMIT

Yahoo Symbols Downloader

This is a blazing fast python script to download almost all yahoo symbols.

Install

pip install git+https://github.com/legout/yahoo-symbols.git

Usage Example

python -m yahoo_symbols.download --max-combination-length=2 --types=equity,etf --output=./database --output-type=parquet

Options

Usage: download.py [OPTIONS]

Options:
  --max-combination-length INTEGER The maximum length of combinations to search for. 
                                   Higher numbers may result in more results at cost of longer download times.
                                   [default: 2]
  --types TEXT                     Choose one or several types. 
                                   Available types are  `equity, mutualfund, etf, index, future, currency, cryptocurrency`
                                   [default: equity]
  --random-proxy / --use-random-proxy
                                   Use a random proxy for each request. Currently only proxies from webshare are supported.
                                   [default: no-random-proxy]
  --verbose / --no-verbose         Wheter to show a progressbar or not. [default: verbose]
  -validation /--no-validation     Run a finally validation of the downloaded symbols. [default: validate]
  --output TEXT                    The output path where the downloaded symbols are saved to. [default: ./db]
  --output-type TEXT               Defines the output type. Options are `parquet`, `csv` or `sqlite3`. [default: parquet]
  --help                           Show this message and exit.

Tips

Number of requests

The benchmarks of this script for one asset type are (tested for type equity):

max query length 1 2 3 4
number of requests 38 1482 56354 2141490
estimated download duration* ~ 3s ~1min ~10min ~3h

Best practice

  • You´ll get the best results (most unique symbols) from the symbol downloads if you run this script seperatly for each type (equity, etf,...).
  • The option --max-query-length should be 2or 3.

Use of a random proxy server.

Note This script should work fine without using random proxies.

When using the option --use-random-proxy free proxies* are used. In my experience, these proxies are not reliable, but maybe you are lucky.

Webshare.io proxies

I am using proxies from webshare.io. I am very happy with their service and the pricing. If you wanna use their service too, sign up (use the this link if you wanna support my work) and choose a plan that fits your needs. In the next step, go to Dashboard -> Proxy -> List -> Download and copy the download link. Set this download link as an environment variable WEBSHARE_PROXIES_URL before running the download script.

Export WEBSHARE_PROXIES_URL in your linux shell

$ export WEBSHARE_PROXIES_URL="https://proxy.webshare.io/api/v2/proxy/list/download/abcdefg1234567/-/any/username/direct/-/"

You can also set this environment variable permanently in an .env file (see the .env-exmaple) in your home folder or current folder or in your command line config file (e.g. ~/.bashrc).

Write WEBSHARE_PROXIES_URL into .env

WEBSHARE_PROXIES_URL="https://proxy.webshare.io/api/v2/proxy/list/download/abcdefg1234567/-/any/username/direct/-/"

or write WEBSHARE_PROXIES_URL into your shell config file (e.g. ~/.bashrc)

$ echo 'export WEBSHARE_PROXIES_URL="https://proxy.webshare.io/api/v2/proxy/list/download/abcdefg1234567/-/any/username/direct/-/"' >> ~/.bashrc

*Free Proxies are scraped from here:


Support my work :-)

If you find this useful, you can buy me a coffee. Thanks!

ko-fi