datamade/nyc-council-councilmatic

First search result for taxi (which we encourage the user to try out) is bill that has been deleted rom legistar

Closed this issue · 8 comments

Weird.

So, this bill does not exist on the Legistar UI, but it does exist in the web API:
https://webapi.legistar.com/v1/nyc/matters/58413?token=....

Here's the OCD API for reference.

@hancush - do we want to scrape data not available in the Legistar user interface?

I'd like to know why it's not in Legistar. (I searched for the identifier and didn't come up with anything.)

But in the meantime, it looks like this is another instance of Legistar returning a 200 when it shouldn't.

In [1]: import requests

In [2]: r = requests.get('http://legistar.council.nyc.gov/LegislationDetail.aspx?ID=3289669&GUID=718D3F80-59AB-4D69-B3B0-C832B0A506E8')

In [3]: r.status_code
Out[3]: 200

In [4]: r.text
Out[4]: 'Invalid parameters!'

I think we should add a condition to _check_errors in python-legistar that raises a ScrapeError when response.text is "Invalid parameters!" so we can skip these bills.

It looks like a version of that bill does exist in Legistar – do we have this version in the OCD API?

So, this actually seems like a case of a duplicate bill. There is updated version of this bill in Legistar, the OCD API, and Councilmatic.

The bill was inserted again rather than updated because the identifier is slightly different – "T 2017-6878" vs. "Res 1762-2017". Unfortunately, we don't have a mechanism for deleting old information when this happens.

Seems like we want to avoid situations like this. Should we be checking bills for the same API source URL, perhaps? (The matter ID is consistent across versions here.) Alternatively, or additionally, perhaps we should check Legistar source URLs to see if they're active?

Related to: opencivicdata/pupa#295

To close this issue, let's simply suggest another search query in the input bar....

I removed the duplicate bill from the OCD API and Councilmatic database (i.e., the bill with id "ocd-bill/afef2cb7-2b8d-4ce9-916b-34725ffa47f4", which duplicated this bill).

Since the missing bill has been removed from the database, I think this issue has been fixed -- is that right @reginafcompton?

Not yet! We actually need to change this:

SEARCH_PLACEHOLDER_TEXT = "Taxi, Resolution 815-2015, etc."

(The conversation above discloses that, in addition to the Taxi bill, the suggested resolution is not in Legistar or our databases: Resolution 815-2015)....Let's just suggest a bill that people can find "Introduction 2018-0327" - it will also be a nice test of the relevance search.