blue-build/cli

feat: Support having recipe files stored in `./recipes`

Closed this issue · 13 comments

As I've been working on trying to take advantage of caching during builds as much as possible, I've started to realize that when a user updates any of their recipe files it would cause a cache invalidation for all RUN steps as the config stage would have completely new data. Separating out the recipes to exist in a directory that is not copied into the build would allow only the Containerfile to get changed from changes to the recipe. The config files would end up staying the same, preserving the cache state of the config's stage.

I would plan to continue to support storing recipe's in ./config, but I would print a warning saying that this method is deprecated in favor of storing your recipe's in ./recipes as the old method can interfere with caching.

I agree with this.

Maybe logging like this is useful to implement: "Recipe uses legacy location & it should be moved to recipe folder to retain advantage of layer caching"

Oh, 2nd paragraph.

I'm not sure what this means to be honest. We already have the config/ directory for recipe files and the module configuration files included in them. It is also my understanding that these are not copied to the build, but held in a stage, but I guess that that's enough then for the cache invalidation.

In my view the inclusion of other types of files (non-recipe) in config/ is a kind of structure born from necessity and evolved. Were a breaking change for separating the files to different directories be required, I'd be more inclined to keep the recipe files & co in config/ and move the other files to a different place. The exact details here could be discussed.

It would also be interesting if we could disable the inclusion of certain files from the build stage based on whether they are the current recipe or included in it, so that no breaking changes would be required. But that's probably just a counterproductive dream.

It would also be interesting if we could disable the inclusion of certain files from the build stage based on whether they are the current recipe or included in it, so that no breaking changes would be required. But that's probably just a counterproductive dream.

Way too difficult to determine as a user could have a script try to access any file they have in ./config.

In my view the inclusion of other types of files (non-recipe) in config/ is a kind of structure born from necessity and evolved. Were a breaking change for separating the files to different directories be required, I'd be more inclined to keep the recipe files & co in config/ and move the other files to a different place. The exact details here could be discussed.

I totally understand where this comes from. The reason I'm suggesting to move the recipes is mostly cause it would be less files for end users to migrate and that the config name still applies to all the files that are in there right now. Since the CLI already has to locate each module file when reading a recipe, it would be non-trivial to notify the user about it. It would print out a log message telling the user where this file currently is and where we would suggest moving it to take better advantage of caching. Which speaking of:

It is also my understanding that these are not copied to the build, but held in a stage, but I guess that that's enough then for the cache invalidation.

They aren't copied directly into the final image stage, correct. The problem stems from any change in the stage-config would require any RUN instruction that mounts this stage to also re-run (which at this point is all of them). Since the recipe files are all located in there right now, a change to a later module in the recipe would cause the entire build to be re-run. Whereas with the recipes never being copied into any part of the build, the only change that would occur is the templated Containerfile. If the Containerfile only has a change in a RUN at a later part, all preceding RUNs will be cached due to the fact that nothing changed in the ./config directory. Only the one module did. The actual recipe files aren't used by the modules since we pass in the context of the module config as an argument to the script.

reason I'm suggesting to move the recipes is mostly cause it would be less files for end users to migrate

I'm not sure I agree. The recipe & module files may be scattered in any number of directories at the users discretion, but files used by modules are most likely in the module-specific folders: config/files/ for files, config/gschema-overrides/ for gschema overrides, etc.

I'm not sure I agree. The recipe & module files may be scattered in any number of directories at the users discretion, but files used by modules are most likely in the module-specific folders: config/files/ for files, config/gschema-overrides/ for gschema overrides, etc.

Let me try putting it this way. If we want to have backwards compatibility with copying those files and make them accessible in the build using your strategy, we would need to copy both the new place and the old place to make sure we got all the users files. Only when we decide at some point that it's been long enough we switch over to just copying from the new place, breaking legacy altogether. And until that point we wouldn't be getting any cache benefits because we would be copying the config directory.

Now with my proposal, you would continue to support reading recipes from config as well as the new recipes directory. However, because we're only encouraging moving the recipe files, we don't have to bother copying another directory for all the build files. Those files continue to exist in the one place they've been copied from, the config directory.

I understand wanting to stay with original plans, but strict adherence to what was done in the past will prevent us from moving forward and improving the tool in the long run. My solution would both improve functionality and maintain backwards compatibility

Hmm, ok. What about two new directories then? (names WIP, I dislike both of these)

  1. files/ for non-recipe files
  2. recipes/ for recipe & module files

The config/ folder would then be copied only if it exists and is not empty. To migrate and get the full benefits, a user would move their files to the files/ directory, and rename their config/ directory to recipes/.

I don't want to have the file directory be called configs/ that doesn't make a lot of sense.

Hmm, ok. What about two new directories then? (names WIP, I dislike both of these)

  1. files/ for non-recipe files
  2. recipes/ for recipe & module files

The config/ folder would then be copied only if it exists and is not empty. To migrate and get the full benefits, a user would move their files to the files/ directory, and rename their config/ directory to recipes/.

I don't want to have the file directory be called configs/ that doesn't make a lot of sense.

I can get behind this, though not a fan of files/ mostly because there's already an expectation of having a files/ for the files module. Maybe build_files/?

build_files/ seems especially clunky to me, but I think that files/ is actually fine. Here's the reasoning:

  • The files module can be used to copy files from the repository to the filesystem without any additional processing.
  • Other modules can be used to copy files from the repository to the filesystem with additional processing / automatic destination path determination / etc.
  • For example, including systemd units via the systemd module is essentially the same as copying the same files with the files module from a special directory to a certain path.
  • It is common practice to include files to copy into the root of the repository regardless of they way they are copied into the image, and the separation we have between files of different modules is not a requirement.

build_files/ seems especially clunky to me, but I think that files/ is actually fine. Here's the reasoning:

  • The files module can be used to copy files from the repository to the filesystem without any additional processing.
  • Other modules can be used to copy files from the repository to the filesystem with additional processing / automatic destination path determination / etc.
  • For example, including systemd units via the systemd module is essentially the same as copying the same files with the files module from a special directory to a certain path.
  • It is common practice to include files to copy into the root of the repository regardless of they way they are copied into the image, and the separation we have between files of different modules is not a requirement.

Wouldn't this break current functionality with the modules? My intent was to not introduce any breaking changes with moving the recipe files. We're no longer using the COPY syntax for the files module in favor of the bash one so that we can run ostree container commit on each module run. The change was made here #125 . We need a single directory to hold all the non-recipe files else this plan isn't going to work. We can't just copy the entire root of the repo.

Wouldn't this break current functionality with the modules? My intent was to not introduce any breaking changes with moving the recipe files. We're no longer using the COPY syntax for the files module in favor of the bash one so that we can run ostree container commit on each module run. The change was made here #125 . We need a single directory to hold all the non-recipe files else this plan isn't going to work. We can't just copy the entire root of the repo.

Let me reiterate further. Your idea was to create a new directory called recipes/ and keep the config/ directory for the files. My idea is basically the same as that, except the config/ directory would be deprecated entirely in favor of recipes/ and files/. The cli would then automatically check whether config/ exists, and if so, continuing as it has before. If it doesn't exist, though, it would assume that the user has migrated to recipes/ and files/, and use recipes/ for the configuration and mount files/ where it would have mounted config, and the modules wouldn't know the difference.

To make this cleaner, though, I'd suggest mounting files/in a different directory than where config/ is mounted currently, adding a check for that in the modules required, and refactoring at least the files module to copy not from files/files/but directly from the files/ folder.

@xynydev part of your proposal requires refactoring the modules.

How do you propose this be done? The current modules expect all files to be in sub-directories of /tmp/config (after mounting), are you proposing that this change to /tmp/files or /tmp/config/files? If /tmp/files, how do we ensure backwards compatibility of both modules and of the cli?

One concern I have is that currently the default modules are unconditionally pulled from the :latest tag (https://github.com/blue-build/cli/blob/main/template/templates/stages.j2#L11). I've been considering forking the templates to add the ability to pin to a particular @sha256 hash and/or customize the source image of the default modules to improve caching and reproducibility. Is that something we would entertain as an official feature?

To add a concrete use case, I have a pipeline which builds off of a weekly rawhide snapshot here: https://github.com/prydom/my-ostree-build/blob/cc03e7ff10cea5fa778666bca1eff76022a779db/config/fedora-kinoite-laptop.yml.

My goal is to only ever rebuild the last 3 layers. Currently changes to modules upstream and changes to recipes will cause a rebuild of all layers and require that I download about 2GiB/day of layers. If I only need to download the last "upgrade" layer and any actually changed layers in my recipes, then it's only about 300MiB/day for the week.

@xynydev part of your proposal requires refactoring the modules.

How do you propose this be done? The current modules expect all files to be in sub-directories of /tmp/config (after mounting), are you proposing that this change to /tmp/files or /tmp/config/files? If /tmp/files, how do we ensure backwards compatibility of both modules and of the cli?

If the user is using the would-be old way of structuring their repo, the folder /tmp/config/ would be present. If they weren't, the folder /tmp/files/ would be present. The CLI would prevent these from existing at the same time. The modules would check for the existence of /tmp/config/and use that if present, otherwise using /tmp/files/. Since the migration to the newer standard would require refactoring from the user, other parts of the file handling could be refactored too. My idea would be to make the files module copy directly from /tmp/files/ instead of a subdirectory, to keep the structure cleaner. This would of course require using the bash files module, which was already done in #125.

One concern I have is that currently the default modules are unconditionally pulled from the :latest tag (main/template/templates/stages.j2#L11). I've been considering forking the templates to add the ability to pin to a particular @sha256 hash and/or customize the source image of the default modules to improve caching and reproducibility. Is that something we would entertain as an official feature?

I've thought of this too. Having the modules versioned some way, where the upgrades are automatic unless there's breaking changes (SemVer) would be the best IMO. If you've got ideas for that, you can post them in the https://github.com/blue-build/modules repo. Adding SHA tags would be a good first step and should be totally doable (if our workflow doesn't do it already).