openshift-metal3/dev-scripts

Suspect yq doesn't always successfully install on 01_install_requirements.sh L100

Opened this issue · 3 comments

Describe the bug
While running 01_install_requirements.sh the script failed/stopped running complaining that yq wasn't found, see tail of initial run: 01_install_requirements-2024-03-13-104741.log

Version / git show-ref
8d1e4db refs/heads/master
8d1e4db refs/remotes/origin/HEAD

To Reproduce
Executed on a RHEL 8.9 system, expected a VM based deployment,
other than CI_TOKEN, listed below are all the changed I made on my config_root.sh

export NUM_MASTERS=3
export NUM_WORKERS=0
export MASTER_MEMORY=65536
export MASTER_DISK=120
export MASTER_VCPU=16
export NUM_EXTRA_WORKERS=2
export EXTRA_WORKER_VCPU=8
export EXTRA_WORKER_MEMORY=32768
export EXTRA_WORKER_DISK=120
export OPENSHIFT_RELEASE_STREAM=4.14
export IP_STACK=v4
export PROVISIONING_NETWORK_PROFILE=Disabled
export REDFISH_EMULATOR_IGNORE_BOOT_DEVICE=True
_

Expected/observed behavior
Expected - looking at L100 as yq didn't exist on my system, it should have gotten installed.

Observed - Failed to find/consume yq on L101, maybe it did really get pip installed just needed a waiter or refresh or something before trying to use/call it.

Anyway I manually installed yq via snapd, before I looked into code/logs, subsequent script re-execution continued as seen on 01_install_requirements-2024-03-13-111338.log, however later I hit other issues which I'm now looking into.

Not sure snapd install method accounts for these two (identical?) or only for the second one.
$pip3 list | grep yq
yq 3.2.3
$yq --version
yq 3.2.3

I believe the real issue is that you don't have the localtion of yq in your PATH, looking at the logs I can see this:
WARNING: The scripts tomlq, xq and yq are installed in '/usr/local/bin' which is not on PATH.

Good point,
Thus I retested on a fresh RHEL 8.9 system, just ran dnf install git make -y .
Searched for yq as rpm as well as on pip3 list, no yq found - confirming base OS doesn't have yq installed.
I then ran make to start the script which re-failed same error,
as you suggested it managed to install yq but "just" fails to update/reload PATH before trying to execute yq.

I rechecked pip3 list, indeed I now do see yq (was) installed and located it:
locate yq
/usr/local/bin/yq
/usr/local/lib/python3.9/site-packages/yq

Printed my current the PATH, only to find as you said no reference to the missing '/usr/local/bin'
echo $PATH
/usr/local/sbin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin

Can we fix 01_install_requirements.sh so that it also handles the PATH update/reload?
Adding on L101 something like:

yq_path=$(find / -name yq -type f -exec dirname {} \; 2>/dev/null | head -n 1)
if [ -n "$yq_path" ]; then
    export PATH="$yq_path:$PATH"
    echo "Added $yq_path to PATH"
else
    echo "yq not found"
fi

If it's sufficient to update PATH only for current script session, or better yet fix it globally and re-source PATH so as to update the PATH for this and any future session/scripts.

I'm not sure this needs fixing to be honest, considering that if a path is missing you can just add it to your login shell configuration file and you'll get it automatically once you log in again.
Besides that, I find really odd that the default PATH of your user does not include /usr/local/bin since it does on my system and all systems we use for testing, including the ones based on RHEL8.
Also seeing that the PATH includes /usr/local/sbin makes me think that something is missing in your system.
As far as I can see in a new system the PATH does include /usr/local/bin
echo $PATH [...] :/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin
Are you running dev-scripts as root maybe? or did you modify your .bashrc or any other shell configuration file in any way?