ckan/ckanext-harvest

Refactor code

Opened this issue · 0 comments

With some of the current changes in core and moving to SOLR 8, we should reconsider some rework on this extension. There are few issues that I would point as concerns:

  1. Creating of the HarvestObject for each dataset. - This is executing at the end of gather stage and if there is significant number of datasets e.g 100k+ it could lead performance issues. My suggestion is to create internal method _create_harvest_object which could be called on every package_search itteration.
  2. Deleting the deleted packages from source. - As it was mention in the Ian's comment, we could use the recently_changed_packages_activity_list API to get the packages for re-harvesting.
  3. Adding the harvesters tab to ckan admin page.
    #500

@ckan/core