mediaforensics/medifor

Clean up large .faiss files from .git

Opened this issue · 2 comments

I noticed there were some large files accidentally committed to the repo. This is more of an inconvenience than a major issue, but it means there's a large pull every time you pip install from the git repo (implicit clone). Not sure if you are still actively maintaining this, but if you are interested in purging these files, I was able to shrink the directory from ~160MB to 1.8MB with this protocol. Definitely practice with a backup repo and alternate remote! You can check out my results here.

git clone $repo /tmp/medifor
cd /tmp/medifor
find . -name '*.faiss' -exec rm {} \; # remove faiss files
git add .
git commit -m "remove large faiss files"

Then comes the fun part. Use dockerized BFG repo cleaner to purge the history:

docker run --rm -it -v $PWD:/data -w /data soodesune/bfg-repo-cleaner --strip-blobs-bigger-than 10M

that will strip the files but it doesn't fully prune them just yet, so then you run

git reflog expire --expire=now --all && git gc --prune=now --aggressive

to prune and collect garbage, and voila! .git should be much smaller.

rtyley/bfg-repo-cleaner#36

It looks like we can't do this in a repository that contains pull requests. We can do it by creating a new repo, moving stuff, deleting this one, then renaming it, but I don't know that it's the best idea to engage in that right now.

Unfortunately, that doesn't help you, since pip install doesn't have a --depth option and apparently won't anytime soon.

I'm leaving this open in case we get the gumption to fix it or have a bright idea. Meanwhile, what I'm seeing is a lot of this:

$ git push
Enumerating objects: 13, done.
Counting objects: 100% (13/13), done.
Writing objects: 100% (29/29), 6.10 KiB | 6.10 MiB/s, done.
Total 29 (delta 13), reused 13 (delta 13), pack-reused 16
remote: Resolving deltas: 100% (15/15), completed with 10 local objects.
To https://github.com/mediaforensics/medifor.git
 ! [remote rejected] refs/pull/1/head -> refs/pull/1/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/10/head -> refs/pull/10/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/11/head -> refs/pull/11/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/12/head -> refs/pull/12/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/14/head -> refs/pull/14/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/16/head -> refs/pull/16/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/17/head -> refs/pull/17/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/17/merge -> refs/pull/17/merge (deny updating a hidden ref)
 ! [remote rejected] refs/pull/18/head -> refs/pull/18/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/19/head -> refs/pull/19/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/19/merge -> refs/pull/19/merge (deny updating a hidden ref)
 ! [remote rejected] refs/pull/2/head -> refs/pull/2/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/20/head -> refs/pull/20/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/21/head -> refs/pull/21/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/22/head -> refs/pull/22/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/23/head -> refs/pull/23/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/24/head -> refs/pull/24/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/25/head -> refs/pull/25/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/27/head -> refs/pull/27/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/29/head -> refs/pull/29/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/3/head -> refs/pull/3/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/30/head -> refs/pull/30/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/31/head -> refs/pull/31/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/32/head -> refs/pull/32/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/33/head -> refs/pull/33/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/34/head -> refs/pull/34/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/35/head -> refs/pull/35/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/36/head -> refs/pull/36/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/37/head -> refs/pull/37/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/38/head -> refs/pull/38/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/4/head -> refs/pull/4/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/5/head -> refs/pull/5/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/6/head -> refs/pull/6/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/7/head -> refs/pull/7/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/8/head -> refs/pull/8/head (deny updating a hidden ref)
 ! [remote rejected] refs/pull/9/head -> refs/pull/9/head (deny updating a hidden ref)

https://stackoverflow.com/questions/34265266/remote-rejected-errors-after-mirroring-a-git-repository

Huh, interesting. Yeah like I said it isn't a tremendous issue, I just did it because I had to fork the medifor API anyways (because...reasons, frankly inadequate ones) so I figured I'd give it a shot and report my findings to you folks.