bingoohuang/blog

删除git仓库历史中的特定文件/目录

Opened this issue · 4 comments

小勇同学@lvyong1985说克隆bssh半天也没反应,是不是历史库太大了。想想,应该是历史中曾经有一个vendor的夹子,所以的依赖都在里面,虽然有一次commit时,我把它给删除了,但是在完整克隆的时候,还是存在于历史之中的。删除之后的效果

删除脚本:

# Remove DIRECTORY_NAME from all commits, then remove the refs to the old commits
# (repeat these two commands for as many directories that you want to remove)
git filter-branch --index-filter 'git rm -rf --cached --ignore-unmatch vendor/' --prune-empty --tag-name-filter cat -- --all
git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d

# Ensure all old refs are fully removed
rm -Rf .git/logs .git/refs/original

# Perform a garbage collection to remove commits with no refs
git gc --prune=all --aggressive

# Force push all branches to overwrite their history
# (use with caution!)
git push origin --all --force
git push origin --tags --force
$ git count-objects -vH                                                                    [五  4/17 11:51:19 2020]
count: 12
size: 48.00 KiB
in-pack: 6349
packs: 2
size-pack: 106.54 MiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes

更多Permanently remove files and folders from Git repo Removing sensitive data from a repository

➜  bssh git:(master) ls -lh .git/objects/pack                                                                                                                                                                    [三  4/22 10:01:10 2020]
total 218112
-r--r--r--  1 bingoobjca  staff   173K  4 22 10:01 pack-e2146fcd2f47806d27b509480353f94d54ee9a01.idx
-r--r--r--  1 bingoobjca  staff   106M  4 22 10:01 pack-e2146fcd2f47806d27b509480353f94d54ee9a01.pack
➜  bssh git:(master) git rev-list --all | xargs -L1 git ls-tree -r --long | sort -uk3 | sort -rnk4 | head -10                                                                                                    [三  4/22 10:01:28 2020]
100644 blob 4443a37a8ee9ac0d4155653dadff7d373f5dfafb 24152821	example/lssh.gif
100644 blob 3ee69feda21ef726ce016593eff199c53321b364 19373053	lssh.exe
100644 blob 54a2efc8577e1773f9965859d6692c98a178e770 10486248	images/lssh.gif
100644 blob 54a2efc8577e1773f9965859d6692c98a178e770 10486248	example/lssh.gif
100644 blob 7b9759e68bc86b54602f4600b4700acb68cb210d 9899474	example/lssh_iterm2.gif
100644 blob 7b9759e68bc86b54602f4600b4700acb68cb210d 9899474	example/lssh.gif
100644 blob 82c8854c56a0b5ff94749387d5feb448553da4c5 9180212	images/3-1.gif
100644 blob 82c8854c56a0b5ff94749387d5feb448553da4c5 9180212	example/3-1.gif
100644 blob 0146b1a27b61d36d9b80c49ab92186951309a220 7913311	example/lssh.gif
100755 blob 2ab34cfa061e26c14355ee196128816249cee4a9 5608848	lssh
➜  bssh git:(master) git count-objects -vH                                                                                                                                                                       [三  4/22 10:04:04 2020]
count: 0
size: 0 bytes
in-pack: 6286
packs: 1
size-pack: 106.50 MiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes
➜  bssh git:(master) git count-objects -vH                                                                                                                                                                       [三  4/22 10:07:30 2020]
count: 0
size: 0 bytes
in-pack: 6127
packs: 1
size-pack: 2.43 MiB
prune-packable: 0
garbage: 0
size-garbage: 0 bytes
➜  bssh git:(master)                                                                                                                                                                                             [三  4/22 10:07:37 2020]
➜  bssh git:(master)                                                                                                                                                                                             [三  4/22 10:08:08 2020]
➜  bssh git:(master) ls -lh .git/objects/pack                                                                                                                                                                    [三  4/22 10:08:08 2020]
total 4992
-r--r--r--  1 bingoobjca  staff   169K  4 22 10:07 pack-d2104f3720ceb5285ea676ab65b51d53ed52c062.idx
-r--r--r--  1 bingoobjca  staff   2.3M  4 22 10:07 pack-d2104f3720ceb5285ea676ab65b51d53ed52c062.pack
➜  bssh git:(master) git rev-list --all | xargs -L1 git ls-tree -r --long | sort -uk3 | sort -rnk4 | head -10                                                                                                    [三  4/22 10:08:13 2020]
100644 blob 2b4cea5b94a1c00a808b784bfd850764c24b6d54  678907	vendor/golang.org/x/sys/windows/zerrors_windows.go
100644 blob b861ec78347240a87e085ec5c6383bf944284b73  130041	vendor/golang.org/x/sys/unix/zerrors_linux_ppc64le.go
100644 blob 7586a134ef754cbf99e3c62ed76d1f87663fdba1  130039	vendor/golang.org/x/sys/unix/zerrors_linux_ppc64.go
100644 blob c1e95e29cbcafc1e1c53be395549f056348e30a9  129886	vendor/golang.org/x/sys/unix/zerrors_linux_sparc64.go
100644 blob f6c99164ffcca243b5c8009ca865e48f8650662a  129789	vendor/golang.org/x/sys/unix/zerrors_linux_s390x.go
100644 blob 02938cb6ed45744f2ee4310d321c7b55795b8b7a  127634	vendor/golang.org/x/sys/unix/zerrors_linux_ppc64le.go
100644 blob ebaca417b461308438f18904a1b6e08881ada125  127632	vendor/golang.org/x/sys/unix/zerrors_linux_ppc64.go
100644 blob d3f6e9065249914d3265d92f7eda8f7b56b5326e  127542	vendor/golang.org/x/sys/unix/zerrors_linux_mips64le.go
100644 blob 7275cd876b317fafe03612e2e2c0e11b2889f4a1  127541	vendor/golang.org/x/sys/unix/zerrors_linux_mipsle.go
100644 blob efba3e5c9df07ab2b73c4ac8fef9c94dea4764cf  127540	vendor/golang.org/x/sys/unix/zerrors_linux_mips64.go
  1. 列出所有仓库中的对象(包括SHA值、大小、路径等),并按照大小降序排列,列出TOP10,来源:彻底删除git中的较大文件(包括历史提交记录)

    在项目根目录下运行 git rev-list --all | xargs -rL1 git ls-tree -r --long | sort -uk3 | sort -rnk4 | head -10

  2. 查看pack文件大小

    git verify-pack -v .git/objects/pack/pack-*.idx | sort -k 3 -g | tail -3

工具类

BFG

# bfg 是java -jar bfg.jar的别名
bfg --delete-files id_{dsa,rsa}  my-repo.git
## 删除大于某个大小的文件
bfg --strip-blobs-bigger-than 50M  my-repo.git
## 替换
bfg --replace-text passwords.txt  my-repo.git

参考

  1. BFG Repo-Cleaner

The Easiest Way To Remove Checked In Credentials From A Git Repo

  1. wget https://repo1.maven.org/maven2/com/madgag/bfg/1.13.0/bfg-1.13.0.jar

  2. Create a Password File
    We need to create a file in which we need to give the password string to be matched into the actual repo. BFG uses this string to be removed from the Git Repo.
    vi password.txt

  3. java -jar bfg-1.13.0.jar --replace-text passwords.txt MyFirstProject

  4. git reflog expire --expire=now --all && git gc --prune=now --aggressive

  5. git push --all --force

Please keep in mind, if you see any instance of credentials/files getting checked in, consider them as exposed and change those right a way. None of the above methods will be able to stop any misuse if the credentials were already copied somewhere.