patrickhoefler/dockerfilegraph

Missing support for incremental stages, e.g. FROM final AS final

iurly opened this issue · 3 comments

iurly commented

I have a use case for which I need to add several groups of
artifacts to the final image, each coming from a given stage,
like in the test case where we have download-node-setup and
download-get-pip.
However, I'd like each stage to be self-contained within
a given portion of the Dockerfile. In other words, I don't want all

COPY --from=stageX ...

statements to be at the end of the Dockerfile, but I'd rather
keep them at the end of the stage that generates them.

For this I'm essentially using the (perhaps unorthodox) pattern:

FROM baseimage AS final
RUN somestuff

FROM ext1 AS build-foo
RUN whatever1
FROM final AS final
COPY --from=build-foo ...

FROM extN AS build-bar
RUN whateverN
FROM final AS final
COPY --from=build-bar ...

In this cause though, the current implementation will
create N copies of the final stage.

PR on its way with my suggested changes.

Hi @iurly,

Firstly, thank you for your interest in the dockerfilegraph project and for taking the time to submit an issue and PR. Your involvement and suggestions are highly appreciated.

After reviewing the Dockerfile pattern you've provided, I took the liberty of running it through hadolint, which raised DL3024 errors:

Dockerfile:6 DL3024 error: FROM aliases (stage names) must be unique
Dockerfile:11 DL3024 error: FROM aliases (stage names) must be unique

Reusing the same alias for different stages can lead to confusion and unpredictable behavior during image creation, which is why hadolint flags it as an issue.

The behavior you observed with dockerfilegraph, where it creates N copies of the final stage, is a direct result of this Dockerfile structure. Our tool interprets the Dockerfile as presented.

Given the issues raised by hadolint and the non-standard pattern in the Dockerfile, I don't think that changing the way dockerfilegraph works in this regard is the best way forward. I'd recommend refactoring the Dockerfile to adhere to Docker best practices, i.e. using unique names for the second and third final stage.

Thank you for your understanding, and I'm keen to hear your thoughts.

iurly commented

Hi @patrickhoefler,

thank you for your explanation. I did suspect this approach could somehow be unorthodox, but I could not find any real reference. I was not aware of the existence of hadolint so thank you for that!
I had a look at https://github.com/hadolint/hadolint/wiki/DL3024 and the example provided looks slightly different.
In that case they're effectively redefining the build stage twice (starting from some other image/stage) so I can understand how that is really undefined.
What I'm trying to do here is essentially "pick up a previous stage again to keep adding layers".
Notice the pattern is always FROM final AS final.
I don't know if there's any other way to achieve this, but this sounds like a legitimate ask with a very well-defined behavior. Indeed, buildkit seems to interpret it correctly and to the best of my knowledge does not raise any error or warning.
To be honest, I have not investigated this any further so if you have any other pointers I'd be happy to learn more!

Hi @iurly,

Thank you for taking the time to detail your perspective and approach.

You're correct, unfortunately the Dockerfile reference doesn't explicitly define the uniqueness of build stage names. However, as best practices evolve, the community tends to identify patterns that can lead to potential pitfalls or confusion. Using the same alias for different stages is one such pattern.

I understand your intention of "picking up a previous stage again to keep adding layers". In essence, you want to continue from where a certain stage left off. Your approach is perfectly fine, it's just the non-unique naming that raises concerns, as it might lead to unpredictable behavior or confusion for others reading the Dockerfile.

To align with best practices while achieving your intended pattern, you just need to ensure the stage names are unique. Here's a refactored version of your example:

FROM baseimage AS final-base
RUN somestuff

FROM ext1 AS build-foo
RUN whatever1
FROM final-base AS final-foo
COPY --from=build-foo ...

FROM extN AS build-bar
RUN whateverN
FROM final-foo AS final-bar
COPY --from=build-bar ...

In this refactored version, each stage continues from where the previous one left off, and we've ensured that the stage names are unique, making the Dockerfile clearer and adhering to best practices.

I hope this provides some clarity 🙂