Bug: git-sync v4 implementation stuck in loop after unexpected worktree removal
bakome opened this issue · 3 comments
Description
The issue appear when sudden deletion or lost of .worktree/{hash} directory happen. After that the next sync is creating the worktree again, but the code is deleting the files immediately. This loop go to infinity and only blinking version of the sync repo is present for less than a second.
Context
In the nature of some applications where git-sync is very crucial this can lead to very non predictive behavior, especially in distributed systems.
I was able to find this error when implementing git-sync actually in Airflow applications, which lead to very bad errors in that system because of this inconsistency.
Additional Details
The issue first appear with usage of NFS mount and sudden restart of the NFS server, which lead to temp lost of synced directories.
Here is a docker-compose environment that can help to replicate the issue:
services:
nfs:
image: itsthenetwork/nfs-server-alpine:latest
restart: "no"
privileged: true
container_name: git-sync-nfs
environment:
SHARED_DIRECTORY: /exports
volumes:
- nfs-server:/exports
git-sync:
image: registry.k8s.io/git-sync/git-sync:v4.0.0
privileged: true
entrypoint: /bin/sh
container_name: git-sync-run
user: "0:0"
command:
- -c
- |
chmod -R 777 /git
mount -v -t nfs -o rw,vers=4 nfs:/ /git
mkdir -p /git/root
/git-sync --verbose 9 --repo=https://github.com/kubernetes/git-sync --root=/git/root --period=5s
restart: "no"
depends_on:
- nfs
volumes:
- nfs-client:/git
volumes:
nfs-server:
nfs-client:
After the initial creation of container please temporary stop nfs container.
docker compose stop git-sync-nfs
The git-sync container should fail after some time, sometimes commands get stuck because folders went missing.
After the fail pls restore the nfs service.
docker compose start git-sync-nfs
And after the restore the infinite add and immediately remove loop is started. I was not able why the first removal is performed and I assume is because of git worktree nature and fsck check, but however git-sync should auto repair from this behavior.
This error is not present in versions less than v4.0.0, version v3.6.9 is checked and is working good.
In Git Worktree documentation there is a note about using NFS or other transferable mounts:
If the working tree for a linked worktree is stored on a portable device or network share which is not always mounted, you can prevent its administrative files from being pruned by issuing the git worktree lock command, optionally specifying --reason to explain why the worktree is locked.
But I think this is can be addressed as separate issue, because I can replicate the problem without external mount.
However I believe is good to have some flags to enable lock on worktrees and implementations that are using non standard mounts.
In this case I think it would not matter because both the worktree and the main repo are on the same volume.
It depends of how the sync is used, still can someone have some process or other cleanup with different kind of storage that can cause this behavior, but I agree this can be very rare. Still I think the proposed fix can prevent this situation and do no harm.