/sec-13f-filings

A nicer way to view SEC 13F filings data

Primary LanguageRubyMIT LicenseMIT

SEC 13F Filings

The code for 13f.info, a more user-friendly way to view SEC 13F filings—the quarterly reports that list certain equity assets held by institutional investment managers

The Rails app has two primary functions:

  1. A back end that downloads 13F data from the SEC's EDGAR system and processes it into a structured PostgreSQL database
  2. A front end that provides a way to view the processed data

Even if you don't care about the front end, you might find the code helpful purely for maintaining a relational database of 13F holdings reports

Live examples

Some example links to showcase the app's functionality:

It might be helpful to compare the above to the SEC website's version of the Berkshire Hathaway Q4 2020 13F

Limitations & caveats

Tl; dr: the SEC does not review filings for accuracy, there don't appear to be many validations on the SEC's side to ensure valid submissions, and even if everything is accurate, 13Fs still don't paint a complete picture of a manager's positions and/or investment outlook. Please do your own research before drawing any conclusions from 13F data

  • The SEC puts this notice at the top of every filing on its site:
    • The Securities and Exchange Commission has not necessarily reviewed the information in this filing and has not determined if it is accurate and complete. The reader should not assume that the information is accurate and complete.
  • 13Fs don't include all relevant information about a manager's positions. In particular, they do not include short positions, and only sometimes include options positions. It's plausible that a manager's actual long/short exposure to an investment is the opposite of what is listed in the 13F
  • 13Fs are not very timely. Generally they are filed 45 days after the end of the quarter, by which time a manager's positions and outlook could have changed significantly
  • Reported market values are as of the reporting period, they do not reflect the price at which the manager acquired the shares
  • Anecdotally I've come across many errors, e.g. CUSIPs with typos, misclassified amendments ("new holdings" vs. "restatement"), obvously incorrect market values, and probably more
    • The app mostly passes data through from the SEC's website "as is", though one exception is that it attempts to correct some market values that appear to be overstated by a factor of 1,000

Some other notable limitations more specific to this app:

  • Data is only available starting in 2014, because that's when the SEC began requiring managers to submit XML files according to a spec
    • Filings before 2014 are generally plain text, and don't follow as consistent a structure, though with more work they could presumably be parsed and integrated into this app too
  • Searching for a company by stock symbol is not fully supported
    • This is because the holdings-level SEC data is reported by CUSIP, and the mapping of CUSIPs to stock symbols requires a paid license
    • The cusip_symbol_mappings table can be filled in manually to support search by symbol, but the mapping data is not included in this repo
    • When in doubt, search for a company by name ("Apple") instead of by symbol ("AAPL")
  • The app doesn't know anything about actual historical market prices, stock splits, dividend payouts, and other possibly relevant events. For example, when you're looking at Berkshire Hathaway's Apple holdings over time, you'll see the number of shares nearly quadrupled from Q2 to Q3 2020, which in reality reflects a 4-for-1 split, not a net purchase of shares

Getting started with development

Prerequisites

The app is a fairly standard Ruby on Rails app. Its primary dependencies include:

  • Ruby
  • PostgreSQL
  • Node.js
  • Yarn

Setting up each of these is beyond the scope of this readme, but if you don't know where to begin, I'd recommend the official Getting Started with Rails guide. A future improvement to this repo could be to include a Docker container to help with environment setup

Install Ruby/JavaScript dependencies and initialize database

Once the prerequisite tools are all configured, run the following commands from the project's root directory:

bundle
bundle exec rake db:setup
yarn

Database schema

There are three main tables:

  1. thirteen_fs - one row for each filing. Roughly corresponds to a filing's "primary doc" XML available on the SEC's website
  2. holdings - each thirteen_f record has many holdings. One holding corresponds to a row in the "information table" XML
  3. aggregate_holdings - a denormalized version of holdings which aggregates across the other_manager and investment_discretion columns. In practice it seems like most of the time it's more interesting to look at aggregate_holdings instead of holdings, but the app keeps both around. aggregate_holdings could be a view instead of a table, but I found that the indexed table helped significantly with query performance

There are a few materialized views that are calculated from the above tables and used to determine "canonical" names for each manager and CUSIP, see the db/views/ folder for more

Populate database with 13F data

There are a few ways to populate data. The simplest is to use the provided MinimalDbSeeder class, which will import and process recent filings from a handful of investment managers

bundle exec rake filings:seed_minimal_db

You can change the default managers and/or time periods either by editing minimal_db_seeder.rb, or by specifying options in the Rails console:

# look up manager CIKs at https://www.sec.gov/edgar/searchedgar/cik.htm
my_ciks = ["CIK1", "CIK2"]
filing_periods = [{year: 2018, quarter: 1}, {year: 2018, quarter: 2}]
MinimalDbSeeder.new(ciks: my_ciks, periods: filing_periods).seed_minimal_db!

The minimal db seeder is intended as a quick and easy way to get your database into a useful state for development purposes, but if you want to import all filings from a given quarter, you can use the following method from within the Rails console:

ThirteenF.import_filings!(filing_year: 2021, filing_quarter: 1)

There's also a rake task available to import all filings from all quarters from Q1 2014 through present:

bundle exec rake filings:import_all

The ThirteenF.import_filings! method will create one placeholder row in the thirteen_fs table for each filing on the SEC's website, but it will not fetch the data for each filing. In order to fetch and process the data into the holdings and aggregate_holdings tables, you need to call thirteen_f.process! on each record, which:

  1. Fetches the primary doc and info table XML files from the SEC's website
  2. Stores them in the relevant primary_doc_xml and info_table_xml columns in the thirteen_fs table
  3. Inserts the appropriate rows into the holdings and aggregate_holdings tables

The ThirteenF.cache_data_and_create_holdings_for_unprocessed method will queue up asynchronous delayed jobs to process whatever unprocessed records are in your thirteen_fs table. You can work off those jobs by running a delayed job worker from the project root:

bundle exec rake jobs:work

Processing seems to average about 1.5 records per second, and as of March 2021 there are ~140,000 records, so it might take over a day to process all of them. Note that the SEC's website has rate limits in place so I would not recommend running more than 2 workers at a time

Running a development server

The app uses the Webpacker gem, I find that the best development experience is to run the Rails server and Webpack dev server in separate terminal windows:

rails server
./bin/webpack-dev-server

Keeping the database updated in production

You can run one clock and (at least) one worker process to keep the database up to date as new filings come in. There's also the clockandworker process, which can run on a single Heroku dyno. See the Procfile for usage

Other development notes

The app uses the Tailwind CSS framework. If you've never used Tailwind before, the short version is that you generally don't write CSS, instead you apply preexisting classes to your HTML templates. Special thanks to Edwin Morris for helping me get set up with Tailwind

The tables are built with DataTables, in most cases using AJAX data sources. Most of the relevant logic lives in the DataController and DataTableFormatter classes

There is no logged in experience, which makes it easier to use edge caching via public Cache-Control headers

Ideas for future improvements

  • Ability to analyze cohorts of managers, i.e. given a list of CIKs, look at combined holdings reports, quarterly comparisons
  • Test suite! Especially geared at edge cases like misclassified amendments, reports where managers overstate values by a factor of 1,000
  • Smarter/faster parsing on "13F release days", i.e. Feb 14, May 15, Aug 14, Nov 14
  • Better autocomplete that does not require an exact substring match
  • More official support for searching by stock ticker symbol

Questions/issues/contact

todd@toddwschneider.com, or open a GitHub issue