ckan/ideas

Dataset workflow plugins (take 2)

Opened this issue · 4 comments

wardi commented

This is a simpler proposal for workflow plugins than #108 based on the discussion and work started by @TkTech on plugin extras: ckan/ckan#3072

Plugin extras are hidden JSON fields attached to users, datasets, resources, groups and orgs that are accessible from plugins but not exposed as fields in the API or tracked in revisions/activities by default. By convention plugins namespace the values they store in these fields, e.g. my_dataset.plugin_extras['approval_workflow'] would be the place an approval-workflow plugin might store its data related to my_dataset.

With plugin extras in place a basic approval workflow plugin could store the changed fields for each dataset and resource in plugin extras, and provide views with the IBlueprint or IRoutes interfaces for editing and approving/reverting suggested changes. With the recent activity work in ckan/ckan#3485 and ckan/ckan#3972 it could add activities to record when suggested changes are made and approved/reverted.

This type of workflow plugin doesn't need any other new interface, and would be much simpler to implement than storing and retrieving suggested changes as a new type of activity.

When datasets or resources are deleted the plugin extras are deleted at the same time, no special handling is required. If a dataset is updated directly (e.g. using the package_update API) the workflow plugin would be able to discard any proposed change, force the update to fail or attempt to keep the change if the update affects different fields.

@wardi would it not be better to store workflow related information in a table dedicated to workflow rather than adding stuff to the dataset object (or other core objects)? This creates a cleaner setup where the direction of dependency is better: the workflow system depends on dataset not the other way round (you aren't changing the dataset object because of the workflow system).

I note that where this started which was with User Extras it makes more sense #16 - though even there i would be concerned to limit the use of this (it should really be specific stuff added to the user and other extensions can create their own tables).

Overall I feel this is could be something of an anti-pattern - rather than extensions creating data in their own clean namespace they add directly or indirectly to the core objects.

@rufuspollock and @wardi any news on this feature? We want to build a data repository that uses some form of authoring workflow (with validation and rejection), and we are deciding on the technology at the moment.
Thanks

loleg commented

This is still a very relevant topic, that I believe has progress in other corners of this project, in particular in ckan/datapusher and frictionless-ci. For some basic comparisons, I would suggest having a look at:

@dhamaris would love to hear what you decided on?

loleg commented

Thanks @rossjones for the pointer - some of these topics have moved over to the CKAN discussions. A couple of relevant threads: