docker-library/tomcat

use multistage builds

TrentonAdams opened this issue · 5 comments

The only way to ensure a "clean" image in the end is to create a new one. So, this cleanup here has no effect, the amount of image size will not change because of a yum clean all, for example. It is better to use multi-staged images.

# clean up anything added temporarily and not later marked as necessary

So, for example, at the top of Dockerfile you might go...

FROM amazoncorretto:8 AS init
# do common stuff here

FROM init AS build
# do build stuff here

FROM init AS final
# do final stuff here
COPY --from=build /some/build/path /final/path

Come to think of it, I'm not sure how this will help. I guess after a yum install you could copy the downloaded rpms and install them manually as part of a "final" image, so that it doesn't update the yum db? Anyhow, just a thought.

That whole segment is one RUN line so removing packages there does effect image size

RUN set -eux; \
\
# http://yum.baseurl.org/wiki/YumDB.html
if ! command -v yumdb > /dev/null; then \
yum install -y yum-utils; \
yumdb set reason dep yum-utils; \
fi; \
if [ -f /etc/oracle-release ]; then \
# TODO there's an odd bug on Oracle Linux where installing "cpp" (which gets pulled in as a dependency of "gcc") and then marking it as automatically-installed will result in the "filesystem" package being removed during "yum autoremove" (which then fails), so we set it as manually-installed to compensate
yumdb set reason user filesystem; \
fi; \
# a helper function to "yum install" things, but only if they aren't installed (and to set their "reason" to "dep" so "yum autoremove" can purge them for us)
_yum_install_temporary() { ( set -eu +x; \
local pkg todo=''; \
for pkg; do \
if ! rpm --query "$pkg" > /dev/null 2>&1; then \
todo="$todo $pkg"; \
fi; \
done; \
if [ -n "$todo" ]; then \
set -x; \
yum install -y $todo; \
yumdb set reason dep $todo; \
fi; \
) }; \
_yum_install_temporary gzip tar; \
\
ddist() { \
local f="$1"; shift; \
local distFile="$1"; shift; \
local success=; \
local distUrl=; \
for distUrl in \
# https://issues.apache.org/jira/browse/INFRA-8753?focusedCommentId=14735394#comment-14735394
'https://www.apache.org/dyn/closer.cgi?action=download&filename=' \
# if the version is outdated (or we're grabbing the .asc file), we might have to pull from the dist/archive :/
https://www-us.apache.org/dist/ \
https://www.apache.org/dist/ \
https://archive.apache.org/dist/ \
; do \
if curl -fL -o "$f" "$distUrl$distFile" && [ -s "$f" ]; then \
success=1; \
break; \
fi; \
done; \
[ -n "$success" ]; \
}; \
\
ddist 'tomcat.tar.gz' "tomcat/tomcat-$TOMCAT_MAJOR/v$TOMCAT_VERSION/bin/apache-tomcat-$TOMCAT_VERSION.tar.gz"; \
echo "$TOMCAT_SHA512 *tomcat.tar.gz" | sha512sum --strict --check -; \
ddist 'tomcat.tar.gz.asc' "tomcat/tomcat-$TOMCAT_MAJOR/v$TOMCAT_VERSION/bin/apache-tomcat-$TOMCAT_VERSION.tar.gz.asc"; \
export GNUPGHOME="$(mktemp -d)"; \
for key in $GPG_KEYS; do \
gpg --batch --keyserver ha.pool.sks-keyservers.net --recv-keys "$key"; \
done; \
gpg --batch --verify tomcat.tar.gz.asc tomcat.tar.gz; \
tar -xf tomcat.tar.gz --strip-components=1; \
rm bin/*.bat; \
rm tomcat.tar.gz*; \
command -v gpgconf && gpgconf --kill all || :; \
rm -rf "$GNUPGHOME"; \
\
# https://tomcat.apache.org/tomcat-9.0-doc/security-howto.html#Default_web_applications
mv webapps webapps.dist; \
mkdir webapps; \
# we don't delete them completely because they're frankly a pain to get back for users who do want them, and they're generally tiny (~7MB)
\
nativeBuildDir="$(mktemp -d)"; \
tar -xf bin/tomcat-native.tar.gz -C "$nativeBuildDir" --strip-components=1; \
_yum_install_temporary \
apr-devel \
gcc \
make \
openssl-devel \
; \
( \
export CATALINA_HOME="$PWD"; \
cd "$nativeBuildDir/native"; \
aprConfig="$(command -v apr-1-config)"; \
./configure \
--libdir="$TOMCAT_NATIVE_LIBDIR" \
--prefix="$CATALINA_HOME" \
--with-apr="$aprConfig" \
--with-java-home="$JAVA_HOME" \
--with-ssl=yes; \
make -j "$(nproc)"; \
make install; \
); \
rm -rf "$nativeBuildDir"; \
rm bin/tomcat-native.tar.gz; \
\
# mark any explicit dependencies as manually installed
deps="$( \
find "$TOMCAT_NATIVE_LIBDIR" -type f -executable -exec ldd '{}' ';' \
| awk '/=>/ && $(NF-1) != "=>" { print $(NF-1) }' \
| sort -u \
| xargs -r rpm --query --whatprovides \
| sort -u \
)"; \
[ -z "$deps" ] || yumdb set reason user $deps; \
\
# clean up anything added temporarily and not later marked as necessary
yum autoremove -y; \
yum clean all; \
rm -rf /var/cache/yum; \
\
# sh removes env vars it doesn't support (ones with periods)
# https://github.com/docker-library/tomcat/issues/77
find ./bin/ -name '*.sh' -exec sed -ri 's|^#!/bin/sh$|#!/usr/bin/env bash|' '{}' +; \
\
# fix permissions (especially for running as non-root)
# https://github.com/docker-library/tomcat/issues/35
chmod -R +rX .; \
chmod 777 logs temp work
# verify Tomcat Native is working properly

See also https://github.com/docker-library/faq#multi-stage-builds

That whole segment is one RUN line so removing packages there does effect image size

Interesting, can you point me to the docs where using a single RUN does affect image size?

That's how Dockerfiles work, in general; https://docs.docker.com/engine/reference/builder/#run:

The RUN instruction will execute any commands in a new layer on top of the current image and commit the results. The resulting committed image will be used for the next step in the Dockerfile.

Each RUN instruction is roughly the same as doing docker run <previous-layer> <run-command> followed by docker commit <container-id>, so anything added and removed in a single RUN command cannot possibly persist (because by the time Docker goes to initiate the commit action, the files are gone):

$ docker pull alpine:3.11
3.11: Pulling from library/alpine
Digest: sha256:b276d875eeed9c7d3f1cfa7edb06b22ed22b14219a7d67c52c56612330348239
Status: Image is up to date for alpine:3.11
docker.io/library/alpine:3.11

$ cat Dockerfile1
FROM alpine:3.11
RUN dd if=/dev/urandom of=/test bs=1M count=1

$ cat Dockerfile2
FROM alpine:3.11
RUN dd if=/dev/urandom of=/test bs=1M count=1 && rm /test

$ docker build - < Dockerfile1
Sending build context to Docker daemon  2.048kB
Step 1/2 : FROM alpine:3.11
 ---> a187dde48cd2
Step 2/2 : RUN dd if=/dev/urandom of=/test bs=1M count=1
 ---> Running in 5d3476d7a761
1+0 records in
1+0 records out
Removing intermediate container 5d3476d7a761
 ---> 362f95f15851
Successfully built 362f95f15851
$ docker history 362f95f15851
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
362f95f15851        19 seconds ago      /bin/sh -c dd if=/dev/urandom of=/test bs=1M…   1.05MB              
...

$ docker build - < Dockerfile2
Sending build context to Docker daemon  2.048kB
Step 1/2 : FROM alpine:3.11
 ---> a187dde48cd2
Step 2/2 : RUN dd if=/dev/urandom of=/test bs=1M count=1 && rm /test
 ---> Running in e8e8d2941b1d
1+0 records in
1+0 records out
Removing intermediate container e8e8d2941b1d
 ---> 70553719c1cd
Successfully built 70553719c1cd
$ docker history 70553719c1cd
IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT
70553719c1cd        17 seconds ago      /bin/sh -c dd if=/dev/urandom of=/test bs=1M…   0B                  
...

I did notice though that if you do the yum clean all && rm -rf /var/cache/yum as a new RUN line, it increases the image size, not reduces it. Very strange.