Don't fail publishing when registry repos are behind the upstream

In purescript/registry#336 (comment) a package failed to commit and push metadata because the local checkout of the registry repo was behind the upstream by one commit. I don't think this should happen — we should be able to pull to re-sync and then try to push again.

The error in this case arose from the implementation of WriteMetadata:

registry-dev/app/src/App/Effect/Registry.purs

Lines 250 to 271 in 8177cd3

    
           WriteManifest manifest@(Manifest { name, version }) reply -> map (map reply) Except.runExcept do 
        
             let formatted = formatPackageVersion name version 
        
             Log.info $ "Writing manifest for " <> formatted <> ":\n" <> printJson Manifest.codec manifest 
        
             index <- Except.rethrow =<< handle env (ReadAllManifests identity) 
        
             case ManifestIndex.insert manifest index of 
        
               Left error -> 
        
                 Except.throw $ Array.fold 
        
                   [ "Can't insert " <> formatted <> " into manifest index because it has unsatisfied dependencies:" 
        
                   , printJson (Internal.Codec.packageMap Range.codec) error 
        
                   ] 
        
               Right updated -> do 
        
                 result <- writeCommitPush (CommitManifestEntry name) \indexPath -> do 
        
                   ManifestIndex.insertIntoEntryFile indexPath manifest >>= case _ of 
        
                     Left error -> Except.throw $ "Could not insert manifest for " <> formatted <> " into its entry file in WriteManifest: " <> error 
        
                     Right _ -> pure $ Just $ "Update manifest for " <> formatted 
        
                 case result of 
        
                   Left error -> Except.throw $ "Failed to write and commit manifest: " <> error 
        
                   Right r -> do 
        
                     case r of 
        
                       Git.NoChange -> Log.info "Did not commit manifest because it did not change." 
        
                       Git.Changed -> Log.info "Wrote and committed manifest." 
        
                     Cache.put _registryCache AllManifests updated

...but it applies everywhere we use writeCommitPush, which includes writing metadata, manifests, package sets, and other registry data to the various registry repositories. There's a narrow window to get behind the upstream — you'd have to get behind in the time between the pull but before the push in the implementation linked below — but it's certainly possible:

registry-dev/app/src/App/Effect/Registry.purs

Lines 678 to 693 in 8177cd3

    
           -- | Write a file to the repository associated with the commit key, given a 
        
           -- | callback that takes the file path of the repository on disk, writes the 
        
           -- | file(s), and returns a commit message which is used to commit to the 
        
           -- | repository. The result is pushed upstream. 
        
           writeCommitPush :: CommitKey -> (FilePath -> Run _ (Maybe String)) -> Run _ (Either String GitResult) 
        
           writeCommitPush commitKey write = do 
        
             let repoKey = commitKeyToRepoKey commitKey 
        
             pull repoKey >>= case _ of 
        
               Left error -> pure (Left error) 
        
               Right _ -> do 
        
                 let path = repoPath repoKey 
        
                 write path >>= case _ of 
        
                   Nothing -> pure $ Left $ "Failed to write file(s) to " <> path 
        
                   Just message -> commit commitKey message >>= case _ of 
        
                     Left error -> pure (Left error) 
        
                     Right _ -> push repoKey

To drill down even further, the specific place where this error arises is in the call to Git.gitCommit. If we are behind the upstream then we bail out of the commit.

registry-dev/app/src/App/CLI/Git.purs

Lines 181 to 200 in 8177cd3

    
           -- | Commit file(s) at the given commit key to the given repository. 
        
           gitCommit :: forall r. GitCommitArgs -> FilePath -> Run (LOG + AFF + r) (Either String GitResult) 
        
           gitCommit { address: { owner, repo }, committer, commit, message } cwd = Except.runExcept do 
        
             let formatted = owner <> "/" <> repo 
        
             let exec = withGit cwd 
        
             let inRepoErr error = " in local checkout " <> cwd <> ": " <> error 
        
             Log.debug $ "Committing to the " <> formatted <> " repo at " <> cwd <> " with message " <> message 
        
             -- First we fetch the origin to make sure we're up-to-date. 
        
             status <- gitStatus cwd 
        
             when (status.branch `Array.notElem` [ "main", "master" ]) do 
        
               Log.warn $ "Not on 'main' or 'master' branch in local checkout " <> cwd <> ", there may be unexpected results." 
        
             case status.status of 
        
               Current -> pure unit 
        
               Ahead n -> Log.warn do 
        
                 "Ahead of origin by " <> show n <> " commits in local checkout of " <> cwd <> ", something may be wrong." 
        
               Behind n -> Except.throw do 
        
                 "Behind of origin by " <> show n <> " commits in local checkout of " <> cwd <> ", committing would cause issues."

It's OK that we bail out, but we shouldn't bail out of the entire pipeline. Instead, we could bail out of the commit, pull again (with --autostash), and then try to commit again. Only on some n number of failures to resync with the upstream would it be reasonable to fully terminate the process.

Is this really a problem? I don't think we gain anything in allowing multiple jobs to run concurrently, and we should just limit the amount of running jobs to a single one

It feels odd to arbitrarily limit jobs such that if someone else is uploading a package then I cannot publish mine. Isn’t it preferable to allow concurrent package uploads?

Certainly concurrent jobs are possible so long as the GitHub pipelines exist. We will have to rewrite the GitHub module to send requests to the server instead of executing anything in the action itself if we want control. (I know eventually we want to disable the GitHub API altogether.)

If we don’t want any concurrent operations (anything that touches a shared git resource, at least) then that would mean that the build matrix, daily importer, etc cannot run if a package is currently being published and vice versa. Isn’t that problematic?

Here's another example of this happening:

purescript/registry#344 was terminated because purescript/registry#345 modified the metadata during its run.

	WriteManifest manifest@(Manifest { name, version }) reply -> map (map reply) Except.runExcept do
	let formatted = formatPackageVersion name version
	Log.info $ "Writing manifest for " <> formatted <> ":\n" <> printJson Manifest.codec manifest
	index <- Except.rethrow =<< handle env (ReadAllManifests identity)
	case ManifestIndex.insert manifest index of
	Left error ->
	Except.throw $ Array.fold
	[ "Can't insert " <> formatted <> " into manifest index because it has unsatisfied dependencies:"
	, printJson (Internal.Codec.packageMap Range.codec) error
	]
	Right updated -> do
	result <- writeCommitPush (CommitManifestEntry name) \indexPath -> do
	ManifestIndex.insertIntoEntryFile indexPath manifest >>= case _ of
	Left error -> Except.throw $ "Could not insert manifest for " <> formatted <> " into its entry file in WriteManifest: " <> error
	Right _ -> pure $ Just $ "Update manifest for " <> formatted
	case result of
	Left error -> Except.throw $ "Failed to write and commit manifest: " <> error
	Right r -> do
	case r of
	Git.NoChange -> Log.info "Did not commit manifest because it did not change."
	Git.Changed -> Log.info "Wrote and committed manifest."
	Cache.put _registryCache AllManifests updated

	-- \| Write a file to the repository associated with the commit key, given a
	-- \| callback that takes the file path of the repository on disk, writes the
	-- \| file(s), and returns a commit message which is used to commit to the
	-- \| repository. The result is pushed upstream.
	writeCommitPush :: CommitKey -> (FilePath -> Run _ (Maybe String)) -> Run _ (Either String GitResult)
	writeCommitPush commitKey write = do
	let repoKey = commitKeyToRepoKey commitKey
	pull repoKey >>= case _ of
	Left error -> pure (Left error)
	Right _ -> do
	let path = repoPath repoKey
	write path >>= case _ of
	Nothing -> pure $ Left $ "Failed to write file(s) to " <> path
	Just message -> commit commitKey message >>= case _ of
	Left error -> pure (Left error)
	Right _ -> push repoKey

	-- \| Commit file(s) at the given commit key to the given repository.
	gitCommit :: forall r. GitCommitArgs -> FilePath -> Run (LOG + AFF + r) (Either String GitResult)
	gitCommit { address: { owner, repo }, committer, commit, message } cwd = Except.runExcept do
	let formatted = owner <> "/" <> repo
	let exec = withGit cwd
	let inRepoErr error = " in local checkout " <> cwd <> ": " <> error

	Log.debug $ "Committing to the " <> formatted <> " repo at " <> cwd <> " with message " <> message

	-- First we fetch the origin to make sure we're up-to-date.
	status <- gitStatus cwd
	when (status.branch `Array.notElem` [ "main", "master" ]) do
	Log.warn $ "Not on 'main' or 'master' branch in local checkout " <> cwd <> ", there may be unexpected results."

	case status.status of
	Current -> pure unit
	Ahead n -> Log.warn do
	"Ahead of origin by " <> show n <> " commits in local checkout of " <> cwd <> ", something may be wrong."
	Behind n -> Except.throw do
	"Behind of origin by " <> show n <> " commits in local checkout of " <> cwd <> ", committing would cause issues."