WGLab/PhenCards

Add companies and foundations section

Closed this issue · 7 comments

This has been a long time coming. I recently happened upon the 990finder. Works pretty darn well and can parse the tables and the URL I think.

As for licensing, there is nothing saying I have to pay them a subscription for the data. But they do have paid for APIs and such. Nothing saying I can't scrape their site or I have to pay a fee to put in ours like KEGG though...

Sounds like the IRS may have this in machine readable format!

https://registry.opendata.aws/irs990/

So two ways I can get this information free, and no strings attached. One is:
https://www.open990.org/catalog/
Which is CC NC BY 4.0 license. Since ours is MIT and purely academic I have no issue with this.
image

The other is the IRS.gov AWS services. Which I am working on.
https://docs.opendata.aws/irs-990/readme.html

Only useful resource for getting awscli to work on aws configure: https://aws.amazon.com/blogs/security/wheres-my-secret-access-key/

So I'm making indexes for this and OHDSI data once it's parsed. Then I just need to add it into the site.

Parsed from IRS foundations file: 416880 entries (for JUST 2019).
Parsed from Open990 grants file: 913386 entries (This is TY 2017 + 2018.).
Parsed from Open990 foundations file: 123581 entries (also TY 2017 + 2018).

IRS can give links out to public XML files on S3 that have all the grant information, though in XML format. Unparsed. I still think this is okay though.

I have now parsed and added this to ElasticSearch indices.

Closed w/ #66.