This project aims to provide easy-to-use tools to extract Net-Asset Values from Exchange-Traded Funds available in public websites.
NAVScraper is based on Scrapy framework and sponsored by Scrapinghub.
Contents
Contributed scripts might require additional software as:
(These commands are executed within the project directory.)
Listing available spiders:
$ scrapy list
Scraping available funds:
$ scrapy crawl vanguard_funds
Scraping data from one fund (using one fund_id
value scraped in the
previous command) for the current year:
$ scrapy crawl vanguard -a fund_id=0967
Scraping data for a specific date range:
$ scrapy crawl vanguard -a fund_id=0967 -a date_start=01/01/2012 -a date_end=01/30/2012
Scraping data from multiple funds and storing the output in a file:
$ scrapy crawl vanguard -a fund_id=0951,0955,3184,0963,0936,0960 -o output.jl
Note
The extension .jl
is used as convention to specify that the file contains
one JSON object per line.
Scraping available funds:
$ scrapy crawl wisdomtree_funds
Scraping data from one fund or more funds:
$ scrapy crawl wisdomtree -a fund_id=40,42 -o output.jl
Note
This spider scrapes all history values as the site does not provide the option to filter by date range.
The output can be use to do analysis or plots. The directory scripts/
contains a script plot.py
to plot the output of a spider.
$ python scripts/plot.py output.jl
The spiders extracts two entities: Fund
and NAV
.
Fund
fields:id
: Identifier (per-site value).symbol
: Fund ticker symbol.name
: Fund name.
For example:
{ "id": "0938", "symbol": "VBK", "name": " Small-Cap Growth " }
NAV
fields:fund_id
: Fund identifier (per-site value)dates
: Array of dates.values
: Array of values for the given dates.
For example:
{ "fund_id": "0938", "dates": ["2013-01-02", "2013-01-03", "2013-01-04"], "values": [76.73, 76.72, 77.15] }
0.1-dev
- Added spider to scrape funds and NAVs from vanguard.com.