DOD redacted logic too trusting
Closed this issue · 2 comments
Some DOD reports that say they are redacted actually have links to the reports.
See DODIG-2014-123
on http://www.dodig.mil/pubs/index.cfm?fy=2014. I believe these are reports that were originally redacted, but later released.
Does the scraper skip over these entirely? I can't see the logic that would, though it's a tricky scraper.
I suspect this report was originally unreleased, and then when it was released, the link was added and the word "(Redacted)" added.
If that's the case, and the report's title/ID was released in 2014 but its redacted text released in 2015, we would not catch it in the usual course of nightly events, since it only looks at the latest year. Maybe that's an assumption worth revisiting.
Running ./inspectors/dod.py --year=2014
picks up this report and doesn't mark it as unreleased
, so I think this is instead an issue of whether we should be fetching information from farther back in time than we do, on a regular basis. I'm opening a new issue for that.
Oops, you are correct. I just saw that it wasn't on oversight.io and forgot that we don't scrape historical regularly.