recursive clone of ocrd_fileformat bloats docker with assets
Closed this issue · 4 comments
Line 356 in d8830fb
This causes all docker images to contain a complete checkout of the assets test data repository. (Twice even, because of the .git index.)
Correction: 4 times:
/build/.git/modules/ocrd_fileformat/modules/ocr-fileformat/modules/vendor/page-to-alto/modules/repo/assets
/build/.git/modules/ocrd_fileformat/modules/assets
/build/ocrd_fileformat/repo/ocr-fileformat/vendor/page-to-alto/repo/assets
/build/ocrd_fileformat/repo/assets
Note: all of these are complete checkouts. They sum up to 630 MB.
Good point and the size and multiple instances of assets have been irking me for a while.
Ideally, we should reduce the size of the repo, i.e. the 20MB TIFF in dfki-testdata and the size of the git index. Perhaps we should move away from integrating assets as a submodule and use GH releases in the make assets
recipes.
At least we don't add the assets to the already large docker image...
Well, for now, at least for the Docker build, by adding **/assets/*
to .dockerignore, we gain about 1 GB.
For native installation, not sure how we can avoid duplicating the checkout though. Yes, using GH releases or artifacts for the make test could maybe be one way.