theodi/open-data-certificate

Campaign autocert doesn't handle links to other data portals

Closed this issue · 4 comments

Campaigns pick up datasets that are really hosted on other portals. The result is it tries to fetch dataset meta from the wrong url.

The following search shows a few examples of these datasets:

http://data.sa.gov.au/data/dataset?q=threatened+species+state+lists&sort=extras_harvest_portal+asc%2C+score+desc%2C+metadata_modified+desc

Check out the ones tagged 'FROM DATA.GOV.AU'

There's a fair bit of cross harvesting on Australian portals. Some call it the "no wrong door" approach. So wherever you go, you'll find all the links. Data.gov.au and data.sa.gov.au do this the most. Not every Government has adopted the idea.

Decision is:

  • By default, ignore datasets that are harvested from other portals, to avoid duplication of certificates.
  • Add an option in the campaign to include harvested datasets in the campaign (work in progress in #1442)

I've updated #1442 to do the above 🎉

suggest this can be closed based on #1442