fluxcd/source-controller

Cryptic error message when git is not available

gecube opened this issue ยท 10 comments

gecube commented

Hi!

I got the next error message:

{"level":"error","ts":"2023-09-19T22:10:07.384Z","msg":"failed to checkout and determine revision: unable to list remote for 'ssh://git@****/zodia/infra/kubernetes': unknown error: remote: ","controller":"gitrepository","controllerGroup":"source.toolkit.fluxcd.io","controllerKind":"GitRepository","GitRepository":{"name":"flux-system","namespace":"flux-system"},"namespace":"flux-system","name":"flux-system","reconcileID":"ae69a081-4408-49c3-abe5-f37135fc1573","error":"failed to checkout and determine revision: unable to list remote for 'ssh://git@****/zodia/infra/kubernetes': unknown error: remote: "}

I am sure that in that very moment GitLab was not available, so it means network error. Unfortunately, it is completely opaque what happened from the error message.

My expectations: the error message will clearly state that git repo source is not available (due to network error).

gecube commented

The same error message in slack:

Screenshot 2023-09-20 at 0 24 14
makkes commented

The underlying error is generated by go-git here.

gecube commented

@makkes agree, thanks for prompt answer. Are there any chances for easy fix?

makkes commented

This is an example message from GitLab:

remote: 
remote: ========================================================================
remote: 
remote: ERROR: The project you were looking for could not be found or you don't have permission to view it.

remote: 
remote: ========================================================================
remote: 
fatal: Could not read from remote repository.

The issue is that go-git only returns the first line of the message returned from the remote and in this case this line isn't very useful

Hi, I started looking into it earlier and then got into something else. Getting back to it, looks like we have all the necessary clues about the problem above. In our go-git wrapper, we have a function that's specifically made to handle such errors goGitError().
Wrapping the code that returns the above error with goGitError() would provide a little better error.
But reading the current implementation of goGitError() I think it was initially written to be used for push operations only, maybe for image-automation-controller. Now that we have this in a get remote head call, it'll be better to change the error to something more appropriate.

makkes commented

In this case (GitLab apparently unavailable) the message "check git secret has write access" would be plain misleading. The remote: message can potentially be caused by all kinds of issues.

I would much rather improve (and rename) the checkNotFoundError message to catch more of the most common errors.

pjbgf commented

@makkes the handling of error descriptions from the server for http is slightly better than for ssh, as it try to provide that information back to the user. I would be more than happy to review a PR to improve that for ssh upstream.

gecube commented

Colleagues, any progress? Is there anything that I could test?

makkes commented

I created a draft PR to mitigate this situation in go-git. /cc @pjbgf

makkes commented

go-git PR has been merged so next up is bumping the go-git version as soon as a new version has been released.