buildingSMART/Sample-Test-Files

Improving sample file repository - Large size

jmirtsch opened this issue · 4 comments

I am having problems with using this repository, and I'm wondering if others experience similar (or it acts as a barrier to other users).

A couple of times, I've had the git repo become "corrupt", and had to clone it anew. I've just done so now, and the repository took an extremely long time to generate, and consumes 9GB of hard disk space. If this discourages me from using it, I'm sure others interested might abandon before they start.

There's 3 large IFC2x3 samples that have been added, Schependomlaan by itself has over 4GB inclusive of pdfs and the like. This sets a precedent and if more are added, certainly might render this repository unusable.

Perhaps an independent working repository should be created for working projects such as the infrastructure extension, and then they can be added to this repository. Alternatively, I'd suggest these large sample projects might be better suited to be distributed else where. Personally I'd foresee these being static without improvements and changes, so there isn't a strong need to have them in a version control repository.

Maybe this is just a problem for users like myself in Australia with latency affecting internet speeds. Harddisks are cheap and data centres aplenty. But for me, any small barrier discouraging a potential implementer to use a resource such as this is not a good thing. And it's not ideal for those wanting to help test a project extension.

We (the TUM group) can confirm these issues. Thus, it doesn't relate to internet speed issues but rather to the repository content itself (as described).
@pjanck @hu-stefan fyi

Hi, Are you guys using git-LFS ?? this is supposed to help in issues you mentioned when you have large files within repos, it operate a simple sense like the onDemand part of oneDrive only downloading large files when you want to access them. this might help on the space front.

Only other advice would be to use git submodule where this repository stays a complete set of all samples then create a repository for each Ifc version set e.g. IFC4X3 abd reference it as a sub project (git submodule is the command) (more info Git Tools - Submodules). this means you can either clone the specific sub project for the version your working on, or clone this project and select which version sub projects you want/need reducing clone time and disk use, without losing the singular source location.

thoughts??

As far as I understand: The problem lies in the git history before the introduction of git LFS. There were ~4GB additions (before LFS) and ~4GB deletions (after introducing LFS) which are burned in the git history.

The decision has been made: we start a new repository.