zephyrproject-rtos/west

Extension commands silently missing when `git` isn't installed

MatthewCroughan opened this issue · 22 comments

If you expect to see an extension command like build, but do not have git installed, then west will silently not tell you anything about this on the CLI.

This also means that some of west's functionality is not reproducible, since it depends on the .git folder, which is mutated whenever anybody pushes a branch. This makes it impossible to content-address and fetch the source code alone, whilst retaining the ability to use west extensions, as they depend upon the constantly mutating/changing .git index.

For more of a background on why .git is not deterministic, there's this comment which I found quite helpful in understanding it NixOS/nixpkgs#8567 (comment)

Looks like this is the code that's preventing zephyr modules from being valid unless there is a .git
https://github.com/zephyrproject-rtos/zephyr/blob/main/scripts/zephyr_module.py#L341-L366

then west will silently not tell you anything about this on the CLI.

I'm lost sorry. Can you please share: 1. reproduction steps, 2. expected results, 3. actual results like most bug reports do?

Sorry we don't have any bug report template in this repo. Due to the small number of issues filed this hasn't been a big problem so far. So please take inspiration from the zephyr template at https://github.com/zephyrproject-rtos/zephyr/issues/new?assignees=&labels=bug&template=bug_report.md&title=

https://github.com/zephyrproject-rtos/zephyr/blob/main/scripts/zephyr_module.py#L341-L366

This code in interestingly not part of west.

since it depends on the .git folder, which is mutated whenever anybody pushes a branch.

.git/ is absolutely not deterministic, this would be a lost cause indeed. In fact even an apparently READ-ONLY git command can cause changes behind the scenes: https://lkml.org/lkml/2017/9/27/693

This makes it impossible to content-address and fetch the source code alone, whilst retaining the ability to use west extensions, as they depend upon the constantly mutating/changing .git index.

You lost me even more. Please explain much more slowly what you're trying to achieve and include an example.

Is this maybe related to

@marc-hb

  1. west init -m https://github.com/nrfconnect/sdk-nrf --mr v2.1-branch

  2. west update

  3. Notice that you can only use the build scripts if the .git directory exists

    user: matthew in ~/tmp/zephyr
    ❯ west --help | grep build
      build:                compile a Zephyr application
    
    user: matthew in ~/tmp/zephyr
    ❯ rm -rf zephyr/.git
    
    user: matthew in ~/tmp/zephyr
    ❯ west --help | grep build
    
    

This code in interestingly not part of west.

It's still something west interacts with, that I was misplacing blame on. I later found that West itself is deeply intertwined with .git and does similar redundant checks, in addition to the file I linked also being ran when it is sourced.

You lost me even more. Please explain much more slowly what you're trying to achieve and include an example.

I'm trying to build Zephyr with Nix, a reproducible build system. In order to fetch the source code with west, you need to create what is called a "fixed-output-derivation". This basically means that you fetch the content once, and then record the recursive sha256 of the directory that is created as a result of running the west commands. This recursive hash of the resulting directory is called a "NAR" (nix archive) hash. This NAR hash cannot be the same twice if we are fetching the .git folder, as git does not deterministically pack/unpack. so we have to remove it in order to fetch and store it with Nix.

This fixed-output-derivation can be thought of as a layer in Docker where you might fetch the source code. You would want that layer to be cryptographically reproducible if you were to run it again, which it cannot be if the .git remains in the result. Fixed output derivations in Nix cannot compile source code, they can only fetch data, this is an important distinction, as it prevents all sorts of cheats that would make a build process unreproducible, which is why I cannot fetch and compile in the same step. It is better for caching to fetch the source code in a separate "layer" anyway. But for as long as the .git directory is required, we cannot store it on a reproducible filesystem like the /nix/store with Nix, since the content will never be the same twice due to the aforementioned non-determinism in .git

Thanks @MatthewCroughan for enlightening me, I think I'm now at the stage where I "know just enough to be dangerous".

mv zephyr/.git zephyr/disabled.git
west --help | grep build # empty

Thanks for the reproduction steps, really helpful.

So the west build command comes from the nrf/west.yml manifest importing west commands located at zephyr/scripts/west_commands/. If you search commands in nrf/west.yml you will find an interesting comment about that.

When zephyr/.git/ is missing, the zephyr import fails and its west build extension command is not found. Now you're raising a very good point: why is west --help silently failing to import? My educated guess is: by design. Simply because you do want certain commands to (mostly) work before west update has been run. First of all west update itself obviously; west --help is a good candidate too because why would you be deprived of --help before having run west update, when you just started using west for the first time?

Note both west status and west list (and probably other commands) do FAIL with the following, loud, expected, verbose and red error message:

FATAL ERROR: failed manifest import in zephyr (zephyr)
  Failed importing "west.yml" from revision "65a99697fa604e28cb26ec96ce935ad720222892"
  Hint: for this to work:
          - zephyr must be cloned
          - its manifest-rev ref must point to a commit with the import data
        To fix, run:
          west update

So you found that building Zephyr depends on west (not in theory strictly speaking but yes in practice), which has a very strong dependency on git. So yes: building Zephyr depends on git.

While Zephyr may be an "extreme" example, the discussion at your link NixOS/nixpkgs#8567 (comment) shows that Zephyr is far from the only build depending on git in practice.

On a personal note I forgot the last time I built anything from a source tarball. Whenever I did, I most likely started by recreating a (temporary) .git/ to track my local changes.

So Nix should solve its .git/ problem once for all; not just for Zephyr and/or west.

The advice I just left at this link is that Nix should simply learn to ignore .git/. Checksumming everything is great and clearly the future of software development for a ton of very good reasons and especially for security. However checksumming a checksumming system (= git) seems like a really weird idea.

Quick before the world forgets what source tarballs were.

If building Zephyr with West depends on git at build time, then it depends on a non-deterministic and unstable piece of data (the .git folder), which makes it practically impossible to fetch it and then use it to build in Nix. I have been making progress getting it to build with raw cmake without west, which leads to less silent failures and more verbosity generally speaking anyway, which is a plus. Though I will have to continue reading the west source code to see how it decides to populate ZEPHYR_MODULES, otherwise I cannot build Zephyr effectively. Do you happen to know how this is done? It's the one piece of information that isn't documented that prevents me from re-implementing it in Nix.

The advice I just left at this link is that Nix should simply learn to ignore .git/

That is what Nix would call an impurity. It's something that can change, outside of our control and knowledge which can effect the build, which means Nix will not ignore it. It is bad to allow inputs to change randomly, which is what ignoring the .git would actually achieve, it is non-determinism, plain and simple.

An example where this would be acceptable is SSL CA certificates, where you would allow this small impurity to increase reproducibility when fetching content via http. But .git is not this way, because it can easily effect the build results.

I instead think that west should change and allow specifying the metadata it's trying to gather from the .git explicitly. It seems like the build script only needs the git revision, which is something I already have available. But instead, west decides that I cannot perform the build at all instead of allowing me to specify the missing data.

You're trying hard to make this a west problem but it's really not; it's a much wider problem between Nix and git that is absolutely not west-specific. Granted, projects managed by west have an "even stronger" dependency on git but they're obviously not the only ones; any build system can have a dependency on git too and some do. Nix must simply solve its git problem once for all and then west won't be a problem either.

I instead think that west should change and allow specifying the metadata it's trying to gather from the .git explicitly.

west's number 1 purpose is to manage multiple git repos https://docs.zephyrproject.org/latest/develop/west/why.html
So asking west to "loosen" and/or explicit its dependency on git makes no sense - in addition to not solving Nix' larger problem with git projects not using west.

It is bad to allow inputs to change randomly, which is what ignoring the .git would actually achieve, it is non-determinism, plain and simple.

I've never used Nix but based on what I've read about it, Nix' problem with git seems very far from "plain and simple". It rather looks like a problem of overlapping and conflicting features where two different tools are both implementing content-addressable storage: so it does not surprise me that they overlap and conflict a bit with each other and that at least one of them must treat the other as a special case. Considering the huge popularity gap between the two, it's pretty obvious which one must.

Though I will have to continue reading the west source code to see how it decides to populate ZEPHYR_MODULES, otherwise I cannot build Zephyr effectively. Do you happen to know how this is done?

I think you're searching in the wrong project. The west build command is not a native west command, it's rather a zephyr-specific west extension implemented in the zephyr git repo. About ZEPHYR_MODULES more specifically, I have a distant and very vague memory that cmake invokes west list or something like it. https://docs.zephyrproject.org/latest/develop/west/without-west.html which I already mentioned above seems to have a lot of ZEPHYR_MODULES-related information, was that not enough?

You're probably wondering why the preferred way to build Zephyr is through the tool to manage multiple git repos. Good question. This clearly hurts Nix which wants to keep git and the build as far away from each other as possible.

Again the answer is in the first two lines at https://docs.zephyrproject.org/latest/develop/west/why.html
That's because the second purpose of west is, by design, to be a "user-friendly command-line interface for basic Zephyr workflows". Elsewhere in the documentation west is also referred to as the "swiss army knife" tool.

https://docs.zephyrproject.org/latest/develop/west/without-west.html which I already mentioned above seems to have a lot of ZEPHYR_MODULES-related information, was that not enough?

I expect this page to be enough (plus the documentation it links to which describes the behavior of the various related cmake variables). If not, please be explicit about what's missing.

As described by @marc-hb this is not a west issue. West is working as designed. The zephyr build system also works without west installed; you have to do more work to set this up, but it's possible. If you are still confused about how that works after reading the documentation linked above, please ask for help in a more appropriate place:

https://docs.zephyrproject.org/latest/develop/getting_started/index.html#asking-for-help

Is there a way to get West to simply print out what it's setting ZEPHYR_MODULES to? I'm not talking about west list which I would have to parse to get any useful information out of, I'm asking if I can actually get the exact way that this environment variable as the program sees it. I'm considering patching West during the build phase just to get this value, then discarding west and the .git folder entirely, as it would allow me to achieve what I want. If you know of a better way I'd love to know.

Have you looked at cmake/modules/zephyr_module.cmake? Found after 10s of git grep

I have. And I'm wondering how to just get west to dump that variable, since it has all the logic for setting it, rather than reinventing the wheel and even implementing the discovery myself.

I do not understand why you are telling me to look at that file, as if it will solve my problems? My understanding is that different Zephyr projects have a different ZEPHYR_MODULES variable set. And no, it is not clear or simple to understand from the documentation, which I have read fully.

This documentation states that you have to set the ZEPHYR_MODULES environment variable, when you are not using west. It does not tell you how to set the variable, or what the logic that west was using to set it was.

I do not understand why you are telling me to look at that file, as if it will solve my problems?

I told you to look at that file because you hadn't said that you already did.

I'm merely trying to spend limited time helping you achieve things that most Zephyr developers don't want.

It does not tell you how to set the variable, or what the logic that west was using to set it was.

That's because it's 100% application dependent. There is no single thing as "compiling Zephyr". You're supposed to know what your application-specific dependencies are. Don't you?

I found the comment a bit condescending and "RTFM" given that you said "10s from grep". That's why I'm asking why you're pointing me to it with such confidence that it will solve my problem.

That's because it's 100% application dependent.

Right, and that's why west usually sets ZEPHYR_MODULES for you. So I need to now set that myself, using another program. But if west could just print it out explicitly such as west --print-vars then it would make my life easier.

I see now why you told me to focus on the cmake file.

# This cmake file provides functionality to import CMakeLists.txt and Kconfig
# files for Zephyr modules into Zephyr build system.
#
# CMakeLists.txt and Kconfig files can reside directly in the Zephyr module or
# in a MODULE_EXT_ROOT.
# The `<module>/zephyr/module.yml` file specifies whether the build files are
# located in the Zephyr module or in a MODULE_EXT_ROOT.
#
# A list of Zephyr modules can be provided to the build system using:
#   -DZEPHYR_MODULES=<module-path>[;<additional-module(s)-path>]
#
# It looks for: <module>/zephyr/module.yml or
#               <module>/zephyr/CMakeLists.txt
# to load the Zephyr module into Zephyr build system.
# If west is available, it uses `west list` to obtain a list of projects to
# search for zephyr/module.yml
#
# If the module.yml file specifies that build files are located in a
# MODULE_EXT_ROOT then the variables:
# - `ZEPHYR_<MODULE_NAME>_CMAKE_DIR` is used for inclusion of the CMakeLists.txt
# - `ZEPHYR_<MODULE_NAME>_KCONFIG` is used for inclusion of the Kconfig
# files into the build system.

If west is available, it uses west list to obtain a list of projects to

It seems like it actually does parse west list using CMake regex? That's very difficult. I could just make a wrapper that makes CMAKE believe it has west during the build phase, meaning I can just extract the stdout of west.

is west used for any thing else important in the cmake files that this wrapper would break?

pointing me to it with such confidence

I have no idea where you saw "confidence" which I clearly stated I don't have on this particular ZEPHYR_MODULES topic.

I spent a significant amount of time trying to help you achieve a non-west objective that no Zephyr developer wants right now. This was admittedly and partly because I was curious about Nix. I'm not curious anymore and I'm not interested in discussing non-technical topics, sorry.

There are other places where you can discuss Zephyr topics; @mbolivar-nordic listed one. Best of luck!

Just confused about the "Found after 10s of git grep" is all.