Agenda
- Recap: What’s Docker
- BuildKit
- The most important Docker rules
- Docker and Docker Compose
- Bind Mounts / Volumes
- Handling Large Models
- ARG vs. ENV vs. .env vs. env_file
- Docker Secrets
- ENTRYPOINT vs. CMD vs. RUN
- Experimental BuildKit Mount Types
- Multi-Stage Docker Images
- Useful Tools: dive, lazydocker, dcc
- Multi-Architecture Images
- Docker Pitfalls
🤚 In a hands-on session we will interactively optimize an existing Dockerfile
Recap: What's Docker?
https://www.docker.com/resources/what-container
Docker Vocabulary
Numbers and Facts
- 4 of 5 container runtimes are Docker
- 3 of 4 container orchestrators are Kubernetes
- 1 of 5 containers live less than 1 minute
- Half of the containers live less than 5 minutes
- Half of the images have security flaws
- Half of the images are replaced within 1 week
Source: Sysdig 2019 Container Usage Report: New Kubernetes and security insights
New in the Docker World: BuildKit
BuildKit is a new building and packaging software for Docker
- 🔥 Build caches & mount types
- 🔥 Concurrent multistage builds
- 🔥 Multi CPU architecture builds (e.g. linux/amd64, linux/arm)
Enabling BuildKit on your machine (Docker >= 18.09):
Command line
docker buildx
DOCKER_BUILDKIT=1 docker ...
Daemon config
{ "features": { "buildkit": true }}
The most important Docker rules
- Reduce the size of our shipment, because it reduces
- … the amount of required storage
- … transferring times
- … the attack surface
- Reduce build time, because
- … only fast builds will be executed frequently
- Make builds reproducible
Dockerfile + docker-compose.yml
Standardized and reproducable definition of image name, runtime parameters, ports, volumes etc…
Via Command Line only
docker build . -t myveryspecialname:1.0
docker run -p “8080:8080” -v “/var/folder:/sth:ro” myveryspecialname:1.0
- 👎 cannot be put into git
- 👎 easy to mess-up
- 👎 hard to remember
- 👎 individual for every image and machine
Custom builds scripts / makefiles are helping, but they are introducing individualism that is unwanted here
Using the Docker-Compose file
# old way: docker-compose build
docker buildx bake
docker-compose up
image: myveryspecialname:1.0
build: .
ports:
- 8080:8080
volumes:
- "/var/somespecialfolder:/sth:ro"
- 👍 can be versioned
- 👍 syntax highlighting and check
- 👍 write and forget - no remembering needed!
- 👍 works on all machines with every image
Bind Mounts and Volumes
https://docs.docker.com/storage/bind-mounts/
Bind Mounts vs Volumes
- Bind Mounts allow to share a file or folder with the host system
- Useful for config files or important data
- Allows to ship external data alongside docker-compose.yml
- Volumes allow to persist a file or folder across container instances
- Useful for temp data such as indecies or caches
- Not really portable
Performance of Bind Mounts
- Performance depends on the guaranteed level of consistency
- “like native” on Linux (VFS)
- “it depends” on macOS / Windows (non-VFS hosts)
- the consistency expectation can be defined (since 17.04)
consistent
- default, perfect consistency, expensivecached
- the host’s view is authoritativedelegated
- the container’s view is authoritative
Source: https://docs.docker.com/storage/bind-mounts/
Example for an Bind Mount (new syntax)
services:
service:
volumes:
- type: bind
source: ${PWD}/data/
target: /data
consistency: cached
read_only: true
Handling Large Data Dependencies
The real question is: what is your usage scenario?
One strategy to tackle large data dependencies
- Bake it into the image 🤔
Further strategies to tackle large data dependencies
- Streaming Platforms - Apache Kafka, RabbitMQ, ...
- Database Clusters - Elastic, PostgreSQL, ...
- Database-as-a-Service - Azure SQL Database, Amazon DocumentDB, ...
- Blob Storages - Amazon S3, Google Cloud Storage, ...
- Model Server - TensorFlow Serving, Clipper, TensorRT
- Distributed File Systems - Ceph-FS, GlusterFS, ...
ARG vs. ENV vs. .env vs. env_file
Hierarchy and scope of variables in Docker
-
Dockerfile variable definitions
-
Arguments (valid during build)
ARG myvar=variable
-
Environment variables (valid during build and run)
ENV myvar=variable
-
-
Docker-Compose variable definitions
-
Environment statement (valid during run)
environment: myvar=${SYSENV:-variable}
-
env_file statement (valid during run)
env_file: env_file_name
-
Passwords in Environment Variables?
This violates the principle of least privilege and the principle of least astonishment:
- Environment variables are clear text
- Environment variables are easy to read on the host
- Environment variables are inherited to all child-processes
- Environment variables are often logged for debugging
Source: https://diogomonica.com/2017/03/27/why-you-shouldnt-use-env-variables-for-secret-data/
People do not seem to care
Secrets
Handling secrets the Docker way involves two steps:
- Use what Docker provided to us for passwords
- Adjust our Docker image entrypoint
Use what Docker provided to us for passwords
-
Define the secret with a name and with a reference to a txt file
-
Assign the secret to a container.
services: service: secrets: - my_secret secrets: my_secret: file: ./my_secret.txt
-
Inside the container, the secrets is available at
/run/secrets/name
Adjust our Docker image entrypoint
-
Instead of defining the password via environment variable
SQL_DB_PASSWORD=mysecretpassword
It is current practice to add the suffix _FILE and to refer to a secret path
SQL_DB_PASSWORD_FILE=/run/secrets/sql-db-password
- Custom entrypoint scripts resolve the path in SQL_DB_PASSWORD_FILE and create a local variable SQL_DB_PASSWORD with its content. Example: MySQL Entrypoint
# usage: file_env VAR [DEFAULT]
# ie: file_env 'XYZ_DB_PASSWORD' 'example'
# (will allow for "$XYZ_DB_PASSWORD_FILE" to fill in the value of
# "$XYZ_DB_PASSWORD" from a file, especially for Docker's secrets feature)
file_env() {
local var="$1"
local fileVar="${var}_FILE"
local def="${2:-}"
if [ "${!var:-}" ] && [ "${!fileVar:-}" ]; then
mysql_error "Both $var and $fileVar are set (but are exclusive)"
fi
local val="$def"
if [ "${!var:-}" ]; then
val="${!var}"
elif [ "${!fileVar:-}" ]; then
val="$(< "${!fileVar}")"
fi
export "$var"="$val"
unset "$fileVar"
}
RUN, ENTRYPOINT, CMD
Dockerfile statement | build phase | run phase | purpose |
---|---|---|---|
RUN | ✅ | Execute an command, commit result into image | |
ENTRYPOINT | ✅ | Specify what to do when container is executed | |
CMD | ✅ | Augment ENTRYPOINT with addition parameters |
Dockerfile
RUN apt-get install glibc
ENTRYPOINT java /app.jar -serve
CMD -Xmx4G
docker-compose.yml
services:
service:
command: -Xmx=14G
Experimental BuildKit Mount Types
-
A new experimental Dockerfile frontend syntax allows us to add cool new features to
RUN
commands:RUN --mount=type=<type>,target=<target> <path-to-app>
-
To enable the experimental frontend, add this line to your
Dockerfile
# syntax = docker/dockerfile:experimental
Source: https://github.com/moby/buildkit/blob/master/frontend/dockerfile/docs/experimental.md
Bind mounts
RUN --mount=type=bind,target=/sth/ cat /sth/lala
- Bind-mounts an external folder (read only)
- Suitable for large file dependencies or source code
Cache mounts
RUN --mount=type=cache,target=/root/.m2 mvn package
RUN --mount=type=cache,target=/var/lib/apt/lists apt update
RUN --mount=type=cache,target=/root/.cache/pip pip update
RUN --mount=type=cache,target=/root/.cache/go-build go build
- Caches a specified folder across builds
- Suitable for dependencies and repositories
Multi-Stage - Dos and Don'ts
Good
FROM ubuntu as build
RUN … // build your stuff here
FROM scratch
COPY --from=build artifact.bin /
ENTRYPOINT ./artifact.bin
Bad
FROM ubuntu as build
RUN … // build your stuff here
FROM alpine as base
COPY --from=build artifact.bin /
ENTRYPOINT ./artifact.bin
FROM base as debug
ENV debug=1
FROM base as prod
ENV debug=0
ENV prod=1
⚠️ Images cannot be distinguished by their build stage⚠️ Mix of configuration and the resulting build artifact
Multi-Stage: Installing vs Base Images
The work from others can be integrated more easily.
Example scenario: The git
command is required during the build of an image. What can we do?
186MB - Ubuntu
FROM ubuntu # 64 MB
# + 122 MB
RUN apt-get update && \
apt-get install -y git
166MB - Ubuntu
FROM ubuntu # 64 MB
# + 102 MB
RUN apt-get update && \
apt-get -y install --no-install-recommends \
git
141MB - Ubuntu
FROM ubuntu # 64 MB
# + 77 MB
RUN apt-get update && \
apt-get -y install --no-install-recommends \
git \
&& rm -rf /var/lib/apt/lists/*
Only 75% of the original installation!
22,6MB - alpine
FROM alpine # 5,6 MB
# + 17 MB
RUN apk add --update git
21,6MB - alpine
FROM alpine # 5,6 MB
# + 16 MB
RUN apk add --no-cache --update git
Only 15% of the size of the smallest Ubuntu image
28MB - alpine/git
FROM alpine/git # 28 MB
Maven Dockerfile Boilerplate 2019
# syntax = docker/dockerfile:experimental
FROM maven:3.5.0-jdk-8-alpine as build
WORKDIR /app
RUN --mount=target=. --mount=type=cache,target=/root/.m2/ \
mvn package -DbuildDirectory=/target
FROM openjdk:8-alpine
RUN apk add --no-cache --update curl
WORKDIR /
COPY --from=build /target/*-standalone.jar /app.jar
ENTRYPOINT ["java", "-jar", "/app.jar"]
CMD -Xmx=2G
EXPOSE 8080
HEALTHCHECK --interval=2m --timeout=10s CMD\
curl --fail http://0.0.0.0:8080/health || exit 1
Experimental Mounts for the Maven execution
RUN --mount=target=. --mount=type=cache,target=/root/.m2/ \
mvn package -DbuildDirectory=/target
- Mount bind of the external Java-Source folder
- No
COPY
of the sources, creation of one layer saved - No previous dependency management, creation of another layer saved
- Cache mount of
/root/.m2/
, dependencies stored there will be cached across image builds
Minor Maven modification
Maven must be instructed to use a different output directory, via
-DbuildDirectory=/target
requires minor changes in the pom.xml:
<project
<properties>
<buildDirectory>${project.basedir}/target</buildDirectory>
<build>
<directory>${buildDirectory}</directory>
Minimal footprint curl installation
RUN apk add --no-cache --update curl
CMD augments ENTRYPOINT
ENTRYPOINT ["java", "-jar", "/app.jar"]
CMD -Xmx=2G
-
Allows to override application-specific parameters easily, without messing arround with the application execution
-
For example, it is easy to assign more heap via docker-compose:
version: "3.7" services: service: build: . command: -Xmx=4G
Built-in Healthcheck
HEALTHCHECK --interval=2m --timeout=10s CMD\
curl --fail http://0.0.0.0:8080/health || exit 1
The container knows best how to monitor its health
Nice Tools
dive, lazydocker, dcc
Dive
A tool for exploring a docker image, layer contents, and discovering ways to shrink the size of your Docker/OCI image.
https://github.com/wagoodman/dive
lazydocker
A simple terminal UI for both docker and docker-compose.
https://github.com/jesseduffield/lazydocker
dcc
Builds and launches docker-compose services, monitors their output using multitail, opens exposed ports automatically in a browser.
https://github.com/heussd/dotfiles/blob/master/scripts/.scripts/dcc
Hands-On Session
Demo Scenario: Publishing a Markdown file as static webpage with two dependencies
- Clone md-page - JS that converts Markdown to HTML on the fly
- Clone my-way-to-view-things - Timm's prefered CSS for text reading
- Combine Readme.md + JS + CSS
- Serve in static Web server
We start with a Dockerfile like this
FROM ubuntu
# Install requirements
RUN apt-get update && apt-get install -y git lighttpd
WORKDIR /var/www/html/
# Pull dependencies
RUN git clone https://github.com/oscarmorrison/md-page
RUN git clone https://github.com/heussd/my-way-to-view-things
# Copy sources
COPY img ./img/
COPY Readme.md ./
# Build artifact
RUN echo '<script src="md-page/md-page.js"></script><noscript>' > 'index.html' && \
echo '<link rel="stylesheet" type="text/css" href="my-way-to-view-things/text-reading.css" media="screen" />' >> 'index.html' && \
echo '' >> 'index.html' && \
cat "Readme.md" >> 'index.html'
# Serve artifact
CMD ["lighttpd", "-D", "-f", "/etc/lighttpd/lighttpd.conf"]
Deep-dive
- apt command
- We see how many binaries are added for the git installation
- We see the apt caches that are added
- clone command
- We notice the
.git
folders
- We notice the
Fixing the Dockerfile
Replace the git commands
FROM alpine/git as git
WORKDIR /git
RUN git clone URL /git && \
git checkout -q "COMMIT"
Introduce gostatic
https://hub.docker.com/r/pierrezemb/gostatic
Build with BuildKit
Hands-on Recap
By following good practices and 🔥 Docker features, we drastically reduced build time and image size
- 97% size reduction
- 91% build time reduction
- Decreased the attack surface
- Increase scalability and reproducability
Multi-Architecture Images
Image Source: https://engineering.docker.com/2019/04/multi-arch-images/
Image Manifests
Image Source: https://www.docker.com/blog/multi-arch-all-the-things/
How manifest lists look like
How manifests look like
How to use
setup:
docker buildx create --name "nubuilder" --use
build:
docker buildx build --platform linux/amd64,linux/arm/v7 -t $(IMAGE_NAME) --push .
clean:
docker buildx rm nubuilder
Docker Pitfalls
Terminology: Run
RUN
inDockerfiles
runs code during image build.docker run
runs containers.
Terminology: Arguments, Commands
ARGS
are environment variables during the build phase.- Runtime arguments can be specified using
CMD
(Dockerfile
) orcommands
(docker-compose). - Docker commands are command line parameters to the Docker binary (such as
docker images
).
Scope of Environment Variables
- Environment variables in Dockerfile are respected during build.
- Environment variables in docker-compose are ignored during build (even if you build with
docker-compose build
)
Terminology: Volumes and Mounts
-
Both are specified, used and maintained with the keyword volumes.
-
Bind mounts are volumes of the type bind in docker-compose:
volumes: - type: bind
Volumes during build
- Volumes persist writes during run, but not during build.
- Writes during build are silently ignored without any warning.
- You cannot be sure what folder is an volume, as images can define arbitrary folders as volumes.
Conclusion
- Build smaller images, faster
- Cache all the things
- Be multi-arch friendly