jsfenfen/990-xml-reader

Download all 990's from one year

Closed this issue · 3 comments

How would I download all the 990 data from one year? Is that possible?

The examples is very specific, but what I'd like to do is pull all the data and then throw out the parts I don't need.

All of the efiled 990's are available in an S3 bucket. There are lotsa tools to download 'em, here's how you could "sync" a local directory to the contents of the bucket using AWS' command line tool.
aws s3 sync s3://irs-form-990/ ./

If you just want one year you can probably achieve that with --include / --exclude flags operating on file names. Finally there is a csv index file for each year at YYYY_index.csv I believe, you could figure out which files are in a year from that. Although there's no guarantee that the file is correct, of course.

You also might look at this repo, which is not as well documented but more aimed at dealing with many files in bulk: https://github.com/jsfenfen/990-xml-database . There are also more resources available here: https://registry.opendata.aws/irs990/

When you say there's no guarantee the files are correct, do you mean that some of the records are in the wrong year? Or that it's just missing a lot of data?

The end goal of my project is to assess the impact of revenue streams on nonprofit sustainability. If there's a particular group that's been removed from the data, that would definitely not be good. 😫

Thank you for your wisdom!