openbikesensor/openbikesensor.github.io

Optimize repository size

Opened this issue · 9 comments

pReya commented

The repo is currently 520.80 MiB (without submodules) and takes a considerable amount of time to clone.

My suggestion would be:

  • Identify large files, and see if they are still used
  • Move large binary files to Git LFS
  • Optimize all images in the project to sane default sizes

We have an orphan branch of the old web page in the repositories history, which we decided to keep in a separate repository.
Splitting the repository will be the first step in reducing the size of the repository.

Archive of the old history has been stored at https://github.com/openbikesensor/archive.openbikesensor.github.io.

Cleanup of the current repository will follow.

All images in content/docs/hardware/v00.02/build-instructions/images have a resolution of 4128x2322 and a total size of around 370 MB. I would suggest resizing them to around 30 % (1720x968) of the original size, which would gain a size reduction of about 300 MB.

To effectively reduce the repository size, we need to replace the images at the initial commit of the main branch with their resized version. This action would rewrite the whole history and invalidate all local clones.

I would push the repository first for safety reasons into a temporary new one. Only after cross-checking I would push the changes into this repository.

@opatut Any objectives against this procedure? Or other suggestions?

pReya commented

@SubOptimal Have you considered moving all photos to Git LFS? Photos are binary files after all, so they do not need to be in the Git history at all (excluding text-based formats such as svg). We could move them to LFS and remove them from the History altogether? Within LFS files are just pointers and are not versioned at all. https://notiz.dev/blog/migrate-git-repo-to-git-lfs

Good explanation of LFS:
https://www.youtube.com/watch?v=9gaTargV5BY

gluap commented

If LFS is seamless to the user I'd be in favour - Seems like we could automatically have all jpgs be in LFS - effectively keeping the repo small for the future.

@pReya As long git-lfs is not part of vanilla git, there might be some issues.

  • the user first need to install it to be able to clone files from the LFS server; otherwise, he gets only a placeholder file
version https://git-lfs.github.com/spec/v1
oid sha256:23dc...
size 65318
  • for users using some package manager, I believe git-lfs can be installed with something similar to apt-get install git-lfs, but how about Git GUI clients or embedded Git implementations

If we agree that we can handle the above, I will push the LFS migrated repo.

pReya commented

LFS is included in:

  • Git for Windows
  • GitHub Desktop
  • Tower
  • GitKraken

But yes, for Linux and Mac command line users, it will mean an additional installation step via their package managers.

I still think LFS is the right tool for this job. The amount of images in the repo is only gonna get bigger in the future, and every time they are changed, the git history will increase in size rapidly.

The Github LFS offer is pretty bad IMO, the quota is just tiny, even for open source projects it seems they have no exception.

https://docs.github.com/en/repositories/working-with-files/managing-large-files/about-storage-and-bandwidth-usage

There is 1 GB of storage and 1 GB of traffic per month. This is for the whole organization. You can pay to get more, but 1 GB is not nearly enough for us. It looks like we're already using half of it somewhere:

image

You get 50 GB storage and 50 GB traffic for 5 USD/mo. That's rather expensive, if you ask me. I'd stick with the repo, it is not more than an inconvenience, and if we clean up the history it'll be much better too. Github is at least known not to flag open source repos if they grow big ;)

pReya commented

The Github LFS offer is pretty bad IMO, the quota is just tiny, even for open source projects it seems they have no exception.

docs.github.com/en/repositories/working-with-files/managing-large-files/about-storage-and-bandwidth-usage

There is 1 GB of storage and 1 GB of traffic per month. This is for the whole organization. You can pay to get more, but 1 GB is not nearly enough for us. It looks like we're already using half of it somewhere:

image

You get 50 GB storage and 50 GB traffic for 5 USD/mo. That's rather expensive, if you ask me. I'd stick with the repo, it is not more than an inconvenience, and if we clean up the history it'll be much better too. Github is at least known not to flag open source repos if they grow big ;)

Now that OBS is a "Eingetragener Verein", we could apply for a non-profit status at GitHub, which would give us a free "Team" license, which has more liberal usage limits.: https://support.github.com/contact/nonprofit

EDIT: Turns out, even the "team" package only offers 1 GB of data. So your point is absolutely correct. This is indeed a very stupid limitation. So let's proceed without LFS and just add images to the repo as before?