billimarie/prosecutor-database

[FEATURE] Research & Implementation: Data Sources

Opened this issue · 6 comments

Tracking alternative data sources which our project might be able to use. This would solve the issue of having to manually find data. Led by @janel-developer, who discovered the CourtListener api.

Might be helpful to look at scripts we developed several years ago:

Thanks! I’ll have a look at the scripts too :)

Tagging @michaelknowles, who is interested in scripts to replace our current manual process.

@janel-developer I was thinking we would have two scripts. One to scrape data into some format, probably the existing JSON format. The other to upload the data into the DB and images into the CDN.

This way we can develop, test, and run these processes independently. This will also make it easier to inspect data before uploading, if needed.

What are your thoughts?

That makes sense to me @michaelknowles

I've gotten a response back from Mike Lissner from the CourtListener project. He is going to give me access to the attorneys endpoint and I'll have a look at what is there. I also asked if he could point me to anything that describes the data they are collecting - there is a lot of it and I don't want to make assumptions about what it represents exactly.

Currently I'm looking at their position data and people data. Once I have access I'll have a look at the attorney data.

Just an update - I've looked at the court listener data but it is really about judges and opinions. There is people data there - and some of those people are district attorneys, but it isn't really useful for us here.

I'm going to spend a little time trying to find other data sources. If I can't find one, I can help @michaelknowles on improving scraping and uploading scripts.

@billimarie - can you tell me more about the court parsing script I had a look at that commit and it looks like its for a rails app from the LadyHacks2017 repo.

As I'm looking for data sources, I can collect urls for sites that may be easy to scrape, in case I can't find a useful api out there. I can just list those in this issue, unless you want to create (or already have) another issue for script improvements.