Esri/geoportal-server-harvester

enhance harvesting of data.gov through CKAN to include full metadata

Closed this issue · 0 comments

zguo commented

currently harvesting of datagov content through CKAN only return a Dublin Core document, full metadata can be extracted through the following parameters:

the datagov ckan api returns harvest_object_id in the extras field, using that value you can get the xml at
/harvest/object/[harvest_object_id]

To tell whether a dataset is harvested from a XML source or datajson source, you can look into the extra field and look for key 'source_datajson_identifier'. If the value is 'true', then the source is from datasjon and the harvest object metadata will be in json format.

examples:
https://catalog.data.gov/api/3/action/package_show?id=u-s-hourly-precipitation-data
https://catalog.data.gov/harvest/object/ac0da4ab-0b88-48c4-af2b-00611df0d956

https://catalog.data.gov/api/3/action/package_show?id=demographic-statistics-by-zip-code-acfc9
https://catalog.data.gov/harvest/object/810835b2-f684-495a-8c0e-d24b15bd2154