This is a simple parser with one goal: to hit the SEC's Latest Filings RSS Feed and parse the XML to return back a workable JSON-like format.
For example, if you went to the SEC's Latest Filings RSS Feed here, you would see XML in the following format:
<?xml version="1.0" encoding="ISO-8859-1" ?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>Latest Filings - Thu, 25 Feb 2016 18:54:17 EST</title>
<link rel="alternate" href="/cgi-bin/browse-edgar?action=getcurrent"/>
<link rel="self" href="/cgi-bin/browse-edgar?action=getcurrent"/>
<id>http://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent</id>
<author><name>Webmaster</name><email>webmaster@sec.gov</email></author>
<updated>2016-02-25T18:54:17-05:00</updated>
<entry>
<title>10-K - TESORO CORP /NEW/ (0000050104) (Filer)</title>
<link rel="alternate" type="text/html" href="http://www.sec.gov/Archives/edgar/data/50104/000005010416000055/0000050104-16-000055-index.htm"/>
<summary type="html">
<b>Filed:</b> 2016-02-25 <b>AccNo:</b> 0000050104-16-000055 <b>Size:</b> 23 MB
</summary>
<updated>2016-02-25T17:29:49-05:00</updated>
<category scheme="http://www.sec.gov/" label="form type" term="10-K"/>
<id>urn:tag:sec.gov,2008:accession-number=0000050104-16-000055</id>
</entry>
<entry>
<title>10-K - BB&T CORP (0000092230) (Filer)</title>
<link rel="alternate" type="text/html" href="http://www.sec.gov/Archives/edgar/data/92230/000009223016000125/0000092230-16-000125-index.htm"/>
<summary type="html">
<b>Filed:</b> 2016-02-25 <b>AccNo:</b> 0000092230-16-000125 <b>Size:</b> 28 MB
</summary>
<updated>2016-02-25T17:25:57-05:00</updated>
<category scheme="http://www.sec.gov/" label="form type" term="10-K"/>
<id>urn:tag:sec.gov,2008:accession-number=0000092230-16-000125</id>
</entry>
.
.
.
<entry>
<title>10-K - Benefitfocus,Inc. (0001576169) (Filer)</title>
<link rel="alternate" type="text/html" href="http://www.sec.gov/Archives/edgar/data/1576169/000119312516478532/0001193125-16-478532-index.htm"/>
<summary type="html">
<b>Filed:</b> 2016-02-25 <b>AccNo:</b> 0001193125-16-478532 <b>Size:</b> 7 MB
</summary>
<updated>2016-02-25T17:05:40-05:00</updated>
<category scheme="http://www.sec.gov/" label="form type" term="10-K"/>
<id>urn:tag:sec.gov,2008:accession-number=0001193125-16-478532</id>
</entry>
</feed>
The XML has a feed
, which has many entries
. Parsing the feed
(SecLatestFilingsRssFeedParser.parse(xml_document)
) would return an Elixir
map that looks like the following:
{:ok,
%{entries: [%{cik: "0000050104",
html_link: "http://www.sec.gov/Archives/edgar/data/50104/000005010416000055/0000050104-16-000055-index.htm",
text_link: "http://www.sec.gov/Archives/edgar/data/50104/000005010416000055/0000050104-16-000055.txt",
rss_feed_id: "urn:tag:sec.gov,2008:accession-number=0000050104-16-000055",
summary: "Filed: 2016-02-25 AccNo: 0000050104-16-000055 Size: 23 MB",
title: "10-K - TESORO CORP /NEW/ (0000050104) (Filer)",
updated_date: "2016-02-25T17:29:49-05:00"
category: "10-K"},
%{cik: "0000092230",
html_link: "http://www.sec.gov/Archives/edgar/data/92230/000009223016000125/0000092230-16-000125-index.htm",
text_link: "http://www.sec.gov/Archives/edgar/data/92230/000009223016000125/0000092230-16-000125.txt",
rss_feed_id: "urn:tag:sec.gov,2008:accession-number=0000092230-16-000125",
summary: "Filed: 2016-02-25 AccNo: 0000092230-16-000125 Size: 28 MB",
title: "10-K - BB&T CORP (0000092230) (Filer)",
updated_date: "2016-02-25T17:25:57-05:00"
category: "10-K"},
.
.
.
%{cik: "0001576169",
html_link: "http://www.sec.gov/Archives/edgar/data/1576169/000119312516478532/0001193125-16-478532-index.htm",
text_link: "http://www.sec.gov/Archives/edgar/data/1576169/000119312516478532/0001193125-16-478532.txt",
rss_feed_id: "urn:tag:sec.gov,2008:accession-number=0001193125-16-478532",
summary: "Filed: 2016-02-25 AccNo: 0001193125-16-478532 Size: 7 MB",
title: "10-K - Benefitfocus,Inc. (0001576169) (Filer)",
updated_date: "2016-02-25T17:05:40-05:00",
category: "10-K"}],
updated: "2016-02-25T18:54:17-05:00"}}
An entry's map contains a cik
(the identifier the SEC uses for a company or
security), an html_link
to the filing, a text_link
to the text version of
teh filing, a category
which represents the category of filing (10-K, 10-Q,
4, etc.), an rss_feed_id
which represents a unique id of the entry, a
summary
which is a short summary of the document, a filing title
and an
updated_date
. The feed is a map of those entries and an updated
date of the
feed.
Be bold, use this tool to bring some sanity to parsing the SEC's XML feed and feel free to contribute!
This project is available in Hex and the package can be installed as:
-
Add sec_latest_filings_rss_feed_parser to your list of dependencies in
mix.exs
:def deps do [{:sec_latest_filings_rss_feed_parser, "~> 0.0.6"}] end
-
Ensure sec_latest_filings_rss_feed_parser is started before your application:
def application do [applications: [:sec_latest_filings_rss_feed_parser]] end
This library is under the MIT license.