jgehrcke/github-repo-stats

Action fails in pull on new data branch

flaix opened this issue · 10 comments

flaix commented

The action does not require the data branch to exist up front. This is good. If the data branch does not exist, it will be created.

+ git checkout gitblit
error: pathspec 'gitblit' did not match any file(s) known to git
+ git checkout -b gitblit

But this breaks later in the script when a default pull is executed. The pull will have no tracked remote ref to operate on, since the branch is new. This results in the following error.

+ git pull
There is no tracking information for the current branch.
Please specify which branch you want to merge with.
See git-pull(1) for details.

    git pull <remote> <branch>

If you wish to set tracking information for this branch you can do so with:

    git branch --set-upstream-to=origin/<branch> gitblit

I believe the PR #30 is there to address this problem, from the looks of it. I haven't tested it, though.

Thank you @flaix and @fboemer!

Patched via #30.

flaix commented

I am really sorry to disappoint, but I just tested the commit in the PR, and it doesn't solve this problem. Sorry, I should have actually tested it, instead of just assuming that the PR fixed this error. 😞

With the change, the pull still doesn't work.

fatal: couldn't find remote ref gitblit

@fboemer , was that actually the problem you wanted to solve and did your change fix it for you?

I am not sure that the problem can be solved with a pull because it effectively will try to merge with a non-existent ref (the remote branch). At least I can't think of any from the top of my head, I would have to research this.

So I guess (ugh, guesswork again) if you want to keep the feature that a data branch can be created by a run, a different approach would be needed.

I am really sorry to disappoint, but I just tested the commit in the PR, and it doesn't solve this problem. Sorry, I should have actually tested it, instead of just assuming that the PR fixed this error. 😞

With the change, the pull still doesn't work.

fatal: couldn't find remote ref gitblit

@fboemer , was that actually the problem you wanted to solve and did your change fix it for you?

I am not sure that the problem can be solved with a pull because it effectively will try to merge with a non-existent ref (the remote branch). At least I can't think of any from the top of my head, I would have to research this.

So I guess (ugh, guesswork again) if you want to keep the feature that a data branch can be created by a run, a different approach would be needed.

Thanks for the report; I don't quite recall the error I was seeing (the related Actions artifacts have since expired) and I'm unfortunately out of Actions minutes for this month, so I can't reproduce for another week or so. @ajagann, do you have any insights here?

flaix commented

@jgehrcke , you may want to reopen this issue. Sorry, I was too quick in assuming this problem was already fixed.

The script does a pull and then push in order to handle collisions with jobs running in parallel. What about reversing that (to how a lazy developer would work), pushing, and if that fails, pulling to merge in the other jobs changes? Would that fix the parallel use case as well? It would handle this issue as the push would go through for an new branch not existing on the remote yet.

The script does a pull and then push in order to handle collisions with jobs running in parallel. What about reversing that

We could also keep the order and simply ignore git pull errors. The downside is that for programming errors we might instead of properly crashing the wrapper script end up staying in the pull-push loop.

flaix commented

Well, you defined a maximum time to stay in the loop, so eventually it would error out at some point. The question is, what other errors could the pull run into at this point. Merge conflicts is the only one I can think of.

The question is, what other errors could the pull run into at this point.

Not so sure. What are all the expected failure modes, and which ones do we not want to 'ignore'?

I mean, for example for the error discussed here in this issue where pull fails with There is no tracking information for the current branch -- there it was a good idea to fail/crash fast.

I'll think about this a tiny more soon. I think we're getting close. Thanks for all the valuable thinking here here!

flaix commented

Yes and no, I would argue that in this case discussed the crash was not helpful as it would have remedied itself with the following (first) push.

I'm not sure there is an elegant way. I didn't find a way for an "atomic" "check-for-branch-and-push" with the porcelain git commands. Right now I do a push first, then the pull and another push if the first push failed. That works, but in a scenario where the pull/push is necessary, i.e. where concurrent pushes occur, the first push is bound to fail. It works for me as I have it set up with a max parallel = 1 setting.

I would argue that in this case discussed the crash was not helpful as it would have remedied itself with the following (first) push.

Accidental treatment of special cases is something I don't want to have in code :D But yes you're right, by now it's not accidental anymore. We're ignoring pull errors now: https://github.com/jgehrcke/github-repo-stats/pull/35/files#diff-6f9d41d046756f0ddc2fcee0626bdb50100d12b88f293734eff742818e03efa2L214 -- and I have noted the special case in the corresponding code comment.