Roll back package uploads on metadata failure
thomashoneyman opened this issue · 4 comments
A recent package upload failed to upload metadata to the registry after the tarball was uploaded to storage. The author retried the publishing workflow, but publishing failed because there was already a tarball in storage. See:
purescript/registry#336 (comment)
purescript/registry#336 (comment)
Specifically, in this section of publishRegistry
, the Storage.upload
and Registry.writeMetadata
functions can throw exceptions, and the process is aborted if so:
registry-dev/app/src/App/API.purs
Lines 644 to 647 in 8177cd3
We should probably catch the Registry.writeMetadata
exception and roll back the Storage.upload
(ie. issue a Storage.delete
) before fully exiting the pipeline. Alternately, we could have a sort of bracket functionality built into the publish pipeline where we record what resources have been modified and on publish failure we roll back all of those modifications.
The fix for this issue should include a test that verifies a failure in one part of the pipeline rolls back other changes such that we can re-issue the publish command and succeed. The tests for that are located here:
registry-dev/app/test/App/API.purs
Lines 54 to 58 in 8177cd3
I am not convinced about rolling back storage uploads: we support multiple storage backends, so would we try to roll all of them back if the pipeline fails? That feels messy.
I wonder if we should instead ensure that the pipeline can never fail after we push to the storage backend(s).
I don’t see how we could ensure the pipeline never fails — we can do our best to avoid failure, but we’re dealing with a remote git server so there’s always some possibility of failure
Yeah I see. I guess my gut feel on this is that once we have "committed" to a package publish, then the only option is to move forward with is and deletion should not be an option.
I am not quite sure what kind of infrastructure we'd need to ensure this, a few ideas:
- we could retry the git operations until they succeed
- ..or have a place in the database for storing "pending" operations that are going to be run in order - this would take care of our concerns with #646 as well