Reprocessing from UI doesn't run all steps
Closed this issue · 5 comments
Reprocessing a dataset using the UI and stage VERBATIM_TO_INTERPRETED
ran only the GRSciColl step (see interpretTypes
below).
"message": "{\"datasetUuid\":\"e17bfff0-4cf4-4130-98ea-f9f053eaef4f\",\"attempt\":227,\"interpretTypes\":[\"GRSCICOLL\"],\"pipelineSteps\":[\"HDFS_VIEW\",\"INTERPRETED_TO_INDEX\",\"VERBATIM_TO_INTERPRETED\"],\"runner\":\"STANDALONE\",\"endpointType\":\"DWC_ARCHIVE\",\"extraPath\":null,\"validationResult\":{\"tripletValid\":false,\"occurrenceIdValid\":true,\"useExtendedRecordId\":null,\"numberOfRecords\":13527,\"numberOfEventRecords\":null},\"resetPrefix\":\"202309062059\",\"executionId\":3265940,\"datasetType\":null,\"routingKey\":\"occurrence.pipelines.verbatim.finished.standalone\",\"datasetInfo\":{\"datasetType\":null,\"containsOccurrences\":true,\"containsEvents\":false}}",
It should run all the steps, or give the user to choice
The API it calls is e.g. https://api.gbif.org/v1/pipelines/history/run?steps=DWCA_TO_VERBATIM&useLastSuccessful=true&reason=scientificNameID
One can add the following parameters to instruct all steps to run:
&interpretTypes=LOCATION,GRSCICOLL,TAXONOMY,METADATA,BASIC,TEMPORAL,CLUSTERING
However, rather than adding those to the UI, I suggest changing the API so that if no interpretType
are given, then ALL steps run by default and leave the UI as it is. This would be more robust as new steps may be added in the future which could unwittingly outdate scripts people have written.
Fixed in Registry
Awesome - thanks @muttcg
If we're using scripts, do we need to pass all the steps or can we omit the parameter and it'll default to all steps please?
@timrobertson100 when you don't pass steps to the query, registry adds all possible steps to the existing message list. So, it is not necessary if you want to run all steps now.
Thanks @muttcg