File too large in the GIT history
garrapato opened this issue · 7 comments
I have noticed that the repository contains a very large file in the GIT history. Is it possible that someone made a commit with some file or directory by mistake? It can be for example nodes_modules or some other like that, unless that really is the correct size of the repository.
I just made a clone of the repository and it took a long time to download it.
The file I refer to in a fresh clone of the repository is:
scratch-gui/.git/objects/pack/pack-c81e535f2cf1cd650ef7a6e69553ee444473a465.pack
Expected Behavior
Less time to clone (download) the repo
Actual Behavior
I just made a clone of the repository and it took a very long time to download it.
Steps to Reproduce
$ git clone https://github.com/LLK/scratch-gui.git
Cloning into 'scratch-gui'...
remote: Enumerating objects: 48, done.
remote: Counting objects: 100% (48/48), done.
remote: Compressing objects: 100% (47/47), done.
Receiving objects: 100% (67966/67966), 11.68 MiB | 815.00 KiB
$ cd scratch-gui
$ du -ch . | grep "G\t"
1.0G ./.git/objects/pack
1.1G ./.git/objects
1.1G ./.git
1.1G .
1.1G total
$ cd ./.git/objects/pack
$ ll
total 2199344
-r--r--r-- 1 garrapato staff 1904120 Sep 1 06:09 pack-c81e535f2cf1cd650ef7a6e69553ee444473a465.idx
-r--r--r-- 1 garrapato staff 1111074667 Sep 1 06:24 pack-c81e535f2cf1cd650ef7a6e69553ee444473a465.pack
The file already measures more than 1 GB!
Possible solution
If a file or directory was uploaded (committed) by mistake, it must be deleted from the story and the following article shows how to do it:
Removing sensitive data from a repository
Operating System and Browser
Mac OS 10.11.14
Chrome Versión 76.0.3809.132 (Build oficial) (64 bits)
I hope this information will be useful
Regards
Maybe this is because of tutorial animated GIFs. We recommend the use of --depth 10
when cloning this repo.
In general, this repo's length commit history will make a full clone take an incredible amount of time. I recommend --depth 1
rather than 10 because it really can get to be too much.
There are indeed a large number of static image files. There's not anything we can do about the size of the history without causing a lot of conflicts.
Do maintainers typically just take the time/space for a full clone? I was under the impression you can't branch/commit/pull on a shallow clone.
I just did that. Looks like it's 1.9GB on disk and took about 33min on my connection for a full clone.
I was curious so I tried this https://stackoverflow.com/a/42544963/69002
It seems like what's taking up a lot of the space is dependencies being commited to the gh-pages branch. LIke d33ef36 for example.
Those lib.min.js seem to be 15MB-20MB each and get committed a few times a day in a few different subdirectories.
With the insight that big files are only in the gh-pages branch (which is usually disconnected from main), the situation can be improved using --single-branch
when cloning.
~/code/scratch
❯ git clone https://github.com/LLK/scratch-gui --single-branch
Cloning into 'scratch-gui'...
remote: Enumerating objects: 44888, done.
remote: Counting objects: 100% (56/56), done.
remote: Compressing objects: 100% (26/26), done.
remote: Total 44888 (delta 38), reused 44 (delta 30), pack-reused 44832
Receiving objects: 100% (44888/44888), 313.38 MiB | 4.37 MiB/s, done.
Resolving deltas: 100% (29577/29577), done.
~/code/scratch [⏱ 1m14s]
❯ du -hs scratch-gui
392M scratch-gui
~1 minute and ~400MB seem acceptable.