kubernetes/git-sync

sparse checkout not working

harshitasaxena05 opened this issue · 7 comments

Hi, I'm trying to use git-sync with sparse checkout feature

I'm having https://dummy.git repo containing /output dir having files - a.txt, b.txt.

I want to download a.txt only.

Below are the env and volume mount I have set -

       env:
         - name: GIT_SYNC_REPO
           value:  https://dummy.git
         - name: GIT_SYNC_DEST
           value: git-sync
         - name: GIT_SYNC_USERNAME
           value: user
         - name: GIT_SYNC_PASSWORD
           value: pwd
         - name: GITSYNC_SPARSE_CHECKOUT_FILE
           value: "output/a.txt"

      volumeMounts:
        - name: content-from-git
          mountPath: /tmp/git

image version: v3.6.6

Below is the error output -

"msg"="too many failures, aborting" "error"="open /output/a.txt: no such file or directory" "failCount"=1

If I don't set this env GITSYNC_SPARSE_CHECKOUT_FILE, it's working by cloning entire repo inside /tmp/git/git-sync/
 
 Can anyone help in resolving this issue? Thanks.

The sparse checkout file is an input file, in git's specific syntax (https://git-scm.com/docs/git-sparse-checkout). For example in k8s that might be from a configmap.

I didn't have a lot of users for sparse-checkout, so we left then UX there. I am open to a nicer UX if we have some real use-case.

Also note that the GITSYNC_ variables (as opposed to GIT_SYNC_) are only in v4, which is still not GA yet.

Following up - the current --sparse-checkout-file is calling git sparse-checkout init. I might consider something like a new --sparse-checkout <value> (repeated) flag which calls git sparse-checkout add. The docs for git sparse-checkout claim that this is experimental:

THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE.

So I am a little worried about adding more API on top of it. Then there's all the secondary options (sparse index, cone mode, etc) that need to be considered.

So I'd want a bit more understanding of the need and some REALLY good e2e tests.

Hello,

We have the below usecase

Files will be stored in git as repository which will be used to pull inside container which uses git-sync
git repository might be huge and we want to download only specific file or folders using sparse checkout feature .
Hence we were not sure what exactly should be value for this env variable since there were no example and only description available .Could you please provide input from above example as to what all should be set in 3.x version so as to download only folders and not entire repo inside container using git sync

I don't know if sparse checkout prevents downloading - I think it is only about checkout (which files are present in the worktree, vs in the "hidden" database.

This flag expects a filename which you present (e.g. in a volume mount) which is filled with git's sparse-checkout syntax. Try 'git help sparse-checkout'.

You might have better success with --depth=1 though

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@thockin, I am using git-sync v4. How exactly do you get this working? Instead of passing in the file, I am writing this locally during a modified entrypoint script.

Update: Nevermind. I figured this out. For anyone who stops here and sees this, you have to simply add the git sparse checkout patterns to the input file, write or mount the file to the container and provide the location relative to --root or the absolute path.