purescript/registry-dev

Roll back package uploads on metadata failure

thomashoneyman opened this issue · 4 comments

A recent package upload failed to upload metadata to the registry after the tarball was uploaded to storage. The author retried the publishing workflow, but publishing failed because there was already a tarball in storage. See:

purescript/registry#336 (comment)
purescript/registry#336 (comment)

Specifically, in this section of publishRegistry, the Storage.upload and Registry.writeMetadata functions can throw exceptions, and the process is aborted if so:

Storage.upload manifest.name manifest.version tarballPath
Log.debug $ "Adding the new version " <> Version.print manifest.version <> " to the package metadata file."
let newMetadata = metadata { published = Map.insert manifest.version { hash, ref: payload.ref, publishedTime, bytes } metadata.published }
Registry.writeMetadata manifest.name (Metadata newMetadata)

We should probably catch the Registry.writeMetadata exception and roll back the Storage.upload (ie. issue a Storage.delete) before fully exiting the pipeline. Alternately, we could have a sort of bracket functionality built into the publish pipeline where we record what resources have been modified and on publish failure we roll back all of those modifications.

The fix for this issue should include a test that verifies a failure in one part of the pipeline rolls back other changes such that we can re-issue the publish command and succeed. The tests for that are located here:

Spec.describe "API pipelines run correctly" $ Spec.around withCleanEnv do
Spec.it "Publish" \{ workdir, index, metadata, storageDir, githubDir } -> do
let testEnv = { workdir, index, metadata, username: "jon", storage: storageDir, github: githubDir }
Assert.Run.runTestEffects testEnv do
-- We'll publish effect@4.0.0

f-f commented

I am not convinced about rolling back storage uploads: we support multiple storage backends, so would we try to roll all of them back if the pipeline fails? That feels messy.

I wonder if we should instead ensure that the pipeline can never fail after we push to the storage backend(s).

I don’t see how we could ensure the pipeline never fails — we can do our best to avoid failure, but we’re dealing with a remote git server so there’s always some possibility of failure

f-f commented

Yeah I see. I guess my gut feel on this is that once we have "committed" to a package publish, then the only option is to move forward with is and deletion should not be an option.
I am not quite sure what kind of infrastructure we'd need to ensure this, a few ideas:

  • we could retry the git operations until they succeed
  • ..or have a place in the database for storing "pending" operations that are going to be run in order - this would take care of our concerns with #646 as well