Simplify and organize container builds
Opened this issue · 2 comments
Right now, the structure for container builds for individual projects is to have the project Dockerfile
live in the project's root directory, then to use the root of the whole repository as the build context in order to copy local dependencies into the image for installation via poetry. These dependencies need to be added explicitly in multiple COPY
statements, including the code for the project itself. The advantages here are that
- Dockerfiles get to live with the applications they're intended to execute, keeping things organized
- Rebuilding is only required when one of the depency directories change
- Depency code can be volume mapped from the host into the container at runtime for easier development
- Each project can use the Poetry/Python version required for its purposes (also potentially a disadvantage, see below)
However, the disadvantages are that
- Individual Dockerfiles are less clear, since the
COPY
statements will be relative to the build context root and not the directory containing the Dockerfile (which is not obvious unless you inspect the CI yamls) - Specifying each project as a dependency to itself is redundant. Even having to specify the local dependencies is redundant since technically we should already know these from the project's
pyproject.toml
orpoetry.lock
. - Images are needlessly bloated by requiring that all the source code be added and live with container forever, even though in production all we need are the built libraries wheels
- As the code base grows, sending the entire repo as the build context to the Docker engine could become really onerous
- No guarantees that projects are built against the same Poetry and Python versions
Possible Solutions
Use Makefiles as outlined here
Advantages
- Necessary dependencies and applications themselves are installed into containers automatically
make
insures that rebuilds only happen when the relevant libraries change- Build contexts are isolated to project directories, reducing the size of the context
- Only copying built libraries reduces the size of the image
- Projects are all built against the same (local to build) Python and Poetry versions
Disadvantages
- Extra dependency on
make
and familiarity withMakefile
syntax - Makefile syntax in
tools
makes certain assumptions about the relative directory depths of applications and libraries - Dockerfiles depend on products of local builds, defeating the purpose of isolated container environments
- Dockerfiles provide almost no clarity about what's going into them
Global base image, project-specific base and build images
Build begins with a global build image which adds libraries and installs the desired Poetry version
ARG PYTHON_TAG
FROM python:${PYTHON_TAG}
ARG POETRY_VERSION
RUN python -m pip install ${POETRY_VERSION}
COPY libs /opt/gw-iaas/libs
built by
docker build -t build .
Then for individual projects, build starts with a python script that builds all dependency wheels via something like (making docker a dependency in the root pyproject.toml
:
import argparse
import re
import pathlib
from io import StringIO
import docker
parser = argparse.ArgumentParser()
parser.add_argument("--project", required=True, type=str, help="Path to project")
args = parser.parse_args()
project = pathlib.Path(args.project)
dockerfile = """
FROM build
COPY . /opt/build
RUN set +x \
\
&& mkdir /opt/lib \
"""
with open(project / "poetry.lock", "r") as f:
lockfile = f.read()
def add(line):
dockerfile += start + "\\"
dockerfile += start + f"&& {line} \\"
start = "\n" + " " * 8
root = "/opt/gw-iaas/libs"
for dep in re.findall("<regex for local deps>", lockfile):
add("cd {root}/{dep}")
add("poetry build")
add("cp dist/*.whl /opt/lib")
add("cd /opt/build")
add("poetry build")
dockerfile = dockerfile[:-2]
client = docker.from_env()
build_image, _ = client.images.build(
fileobj=StringIO(dockerfile),
tag=f"{project}:build"
)
client.images.build(
path=project / "Dockerfile",
tag=str(project)
)
client.images.remove(build_image)
then individual project Dockerfiles would include a line
COPY --from=<project>:build /opt/lib/*.whl .
RUN pip install *.whl && rm *.whl
Advantages
- Unifies and isolates Poetry and Python environments used for builds
- Automates addition of dependencies and project code
- Project builds have local contexts and
COPY
paths are relative to Dockerfile location - Only installing wheels reduces the size of images
Disadvantages
- Addition of extra host dependencies
- Easy for CI, but local builds become more complicated (could solve with a Makefile?)
- Haven't tested this so no idea if it will actually work
- Python script obscures what's going into container, makes builds less reproducible (Python script dependent on host environment)
Draft PR tries to create a hybrid of these solutions by doing Makefile install in intermediate image at the top of project-specific Dockerfiles using a global build
container, adding three lines of boiler plate that I can live with (the monorepo example mentioned above actually does something like this in tools/cloudbuild.yaml
).
Currently running into final wheel install issues stemming from poetry/issues#1168. Will keep monitoring this and trying to come up with workarounds as time permits. Monorepo code manages to get around this, but it's not clear to me how.
Note that while draft PR contains Python code for reference, this likely won't be part of final PR both because its unnecessary given the current build structure and is probably bad Docker practice.