[Hackathon 2024][Gadolinium][Docker] Don’t include unnecessary files in image
Neobody-0 opened this issue · 0 comments
Rule title
Don’t include unnecessary files in image
Language and platform
Docker
Rule description
Eliminating unnecessary files will naturally decrease the image size. A smaller image shortens build time and lowers storage costs, contributing to energy savings and environmentally friendly coding practices.
One way to avoid unnecessary files is to use a .dockerignore
file.
When executing a build command, the build client searches for a .dockerignore
file in the context's root directory. Should this file be present, it excludes files and directories matching the patterns specified within from the build context prior to sending it to the builder.
https://docs.docker.com/build/building/context/#dockerignore-files
Rule short description
Exclude files not relevant to the build, without restructuring your source repository. https://docs.docker.com/develop/develop-images/guidelines/
Rule justification
In this article as an example : Ten simple rules for writing Dockerfiles for reproducible data science - PMC (nih.gov), sometimes we could introduce data not useful for final image (data, temporary files, dependencies, etc), as mentioned in the article “Storing data files outside of the container allows handling of very large or sensitive datasets, e.g., proprietary data or private information. Do not include such data in an image! To avoid publishing sensitive data by accident, you can add the data directory to the .dockerignore file, which excludes files and directories from the build context, i.e., the set of files considered by docker build. Ignoring data files also speeds up the build in cases where there are very large files or many small files.”
Why it matters:
- Image Size Reduction : Larger images necessitate increased storage space, longer transfer times, and more resources for loading into memory and processing.
- Speed Up Build : Ignoring data files can accelerate the build process when large files or numerous small files are involved.
- Security : Temporary files may hold sensitive data, including secrets or debug information, posing a security risk and potential for data leakage.
Severity / Remediation Cost
Severity : Major, some files could be huge, for example node_modules can have up to 400Mo (like 20x the size of an alpine image) .
Remediation cost : Easy, users need to add or complete the docker ignore file.
Implementation principle
The feasibility of the implementation hinges on the ability to scan the .dockerignore file with SonarQube. If this is achievable, we can verify its presence and possibly employ a template (similar to .gitignore) to enumerate all the files that should be omitted.
An enhancement to this rule, though potentially challenging to implement, would be to examine the base image in the Dockerfile to identify the technology and apply a corresponding template for the .dockerignore file.