NAVScraper

This project aims to provide easy-to-use tools to extract Net-Asset Values from Exchange-Traded Funds available in public websites.

NAVScraper is based on Scrapy framework and sponsored by Scrapinghub.

Contents

Requirements

Contributed scripts might require additional software as:

(These commands are executed within the project directory.)

Listing available spiders:

$ scrapy list

Scraping available funds:

$ scrapy crawl vanguard_funds

Scraping data from one fund (using one fund_id value scraped in the previous command) for the current year:

$ scrapy crawl vanguard -a fund_id=0967

Scraping data for a specific date range:

$ scrapy crawl vanguard -a fund_id=0967 -a date_start=01/01/2012 -a date_end=01/30/2012

Scraping data from multiple funds and storing the output in a file:

$ scrapy crawl vanguard -a fund_id=0951,0955,3184,0963,0936,0960 -o output.jl

Note

The extension .jl is used as convention to specify that the file contains one JSON object per line.

Scraping available funds:

$ scrapy crawl wisdomtree_funds

Scraping data from one fund or more funds:

$ scrapy crawl wisdomtree -a fund_id=40,42 -o output.jl

Note

This spider scrapes all history values as the site does not provide the option to filter by date range.

The output can be use to do analysis or plots. The directory scripts/ contains a script plot.py to plot the output of a spider.

$ python scripts/plot.py output.jl

The spiders extracts two entities: Fund and NAV.

NAV fields:

For example:

{
  "fund_id": "0938",
  "dates": ["2013-01-02", "2013-01-03", "2013-01-04"],
  "values": [76.73, 76.72, 77.15]
}