absurd amounts of "apt-get update"
Opened this issue · 4 comments
The Dockerfile is currently running apt-get update no less than 15 times.
apt update
is just to rebuild the local cache of the repository.
The build time is 15 minutes on my CI. Do you expect the repository to change significantly in 15 minutes?
Yes -> Keep the apt updates
No -> Reduce it to one call.
AMEND:
This also blows up the size of the built container significantly. On my CI it takes about half a gigabyte. This is absolutely beyond ridiculous.
@tvanrielwendrich Why do you build it in CI? It is meant to be pulled from docker registry to CI.
When you use prebuilt docker then you have fully working test package, without any build time.
The idea behind installing so many packages is to have a flexible test environment out of the box.
Why so many updates? Due to docker caching system.
Any excessive files are cleared in the end.
Why do you build it in CI? It is meant to be pulled from docker registry to CI.
I have two CI runners. One can build docker containers and the other uses them. For security reasons the one that uses them does not have access to the internet. Having to pull 500MB of docker container is still a waste of disk space. Slimming this down by reducing the amount of layers will drop the amount of disk space required significantly. Using an OS that was built to use with docker in mind (such as Alpine or debian:*-slim) will further reduce the overhead the container imposes.
The idea behind installing so many packages is to have a flexible test environment out of the box.
I was under the assumption Docker containers were less=more. For flexibility and completeness it would be a nicer idea to switch to VirtualBox or QEMU. And my initial complaint wasn't even about so many packages, it was about network usage during build time.
Why so many updates? Due to docker caching system.
Please explain. Because as far as I know Docker caches layers in a container as they follow each other. I was under the assumption docker caching works as following: You have a docker container with 3 layers, L1, L2 and L3. for simplicity lets assume the first 7 digits of their hashes are 1111111
, 2222222
and 3333333
. If one of them changes (which surely happens when you run apt update
in the container because of timestamps in the local apt cache), the subsequent hash wont match and the following layer won't match either. AKA, running apt update more than once screws your docker cache over.
Any excessive files are cleared in the end.
By updating your repository cache again?
apt autoremove
only works if accidentally packages get left behind during an apt remove
. There are no packages being uninstalled from the container during build time.
@tvanrielwendrich thanks for posting your efforts.
Why do you build it in CI? It is meant to be pulled from docker registry to CI.
I have two CI runners. One can build docker containers and the other uses them. For security reasons the one that uses them does not have access to the internet. Having to pull 500MB of docker container is still a waste of disk space. Slimming this down by reducing the amount of layers will drop the amount of disk space required significantly. Using an OS that was built to use with docker in mind (such as Alpine or debian:*-slim) will further reduce the overhead the container imposes.
In your scenario this might work better, however, I would put some question marks around your workflow.
- Using prebuilt container which is downloaded once, and while updating thanks to docker layers only some parts are actually downloaded - which saves network load in comparison to fetching all libs and building image locally on each git push, over and over again
- The same applies to disk space - while building, lots of temp files are saved locally, they are removed from prebuilt package - so if you compare package size only, you got some gain (but let's remember that you've used different distro and not enabled all packages which exist in original)
- Of course, using Alpine would reduce the size significantly, but it is not the main priority of this package and so far nobody complaint about package size.
- I don't see any security improvements with your approach, if you have any doubts about it you can always create own docker repo and pull from it.
- In real world CI runs often 1000+ tests which can take several hours, so really the size and download time is not the bottleneck here.
The idea behind installing so many packages is to have a flexible test environment out of the box.
I was under the assumption Docker containers were less=more. For flexibility and completeness it would be a nicer idea to switch to VirtualBox or QEMU. And my initial complaint wasn't even about so many packages, it was about network usage during build time.
Yes in production it is very true they should be as light as possible, but again this is a testing lib, and its main goal is to be easily included in CI process, on different systems, with different configurations on multiple remote runners etc. so having it all in one package is actually the benefit.
Why so many updates? Due to docker caching system.
Please explain. Because as far as I know Docker caches layers in a container as they follow each other. I was under the assumption docker caching works as following: You have a docker container with 3 layers, L1, L2 and L3. for simplicity lets assume the first 7 digits of their hashes are
1111111
,2222222
and3333333
. If one of them changes (which surely happens when you runapt update
in the container because of timestamps in the local apt cache), the subsequent hash wont match and the following layer won't match either. AKA, running apt update more than once screws your docker cache over.
All seems fine at first glance, but source packages change a lot, so after some time building package in docker cloud caused errors while building. To reduce it as much as possible all should be almost one-liner, which on the other hand would disable all benefits of layer caching functionality.
Any excessive files are cleared in the end.
By updating your repository cache again?
apt autoremove
only works if accidentally packages get left behind during anapt remove
. There are no packages being uninstalled from the container during build time.
Temp files are removed, which reduces the size of the final repo.
So as said before, your approach might be better for your scenario, cause, in the end, it is tailored to your needs, but as in earlier comments, in my opinion, it breaks some fundamental concepts, and probably that's why you are trying to reduce the package size to absolute minimum, in the first place.