helmfile/helmfile

failure in hook should not prevent next hooks to execute

Opened this issue · 17 comments

Operating system

Ubuntu 20.04 LTS

Helmfile Version

v0.160.0

Helm Version

v3.13.2

Bug description

When a hook fails, all next hooks should execute. However, the failure should be propagated to the .Event.Error, so that hooks can decide what to do when an error happened.

I believe that's exactly the point behind .Event.Error.

Example helmfile.yaml

releases:
  - name: raw
    chart: https://github.com/dysnix/charts/releases/download/raw-v0.3.2/raw-v0.3.2.tgz
    wait: true
    hooks:
      - name: first-hook
        events:
          - postsync
        showlogs: true
        command: "false"
      - name: second-hook
        events:
          - postsync
        showlogs: true
        command: "echo"
        args:
          - "this should execute, but it does not"

Error message you've seen (if any)

helmfile sync
Upgrading release=raw, chart=https://github.com/dysnix/charts/releases/download/raw-v0.3.2/raw-v0.3.2.tgz
Release "raw" does not exist. Installing it now.
NAME: raw
LAST DEPLOYED: Tue Dec 26 10:21:27 2023
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None

Listing releases matching ^raw$
raw     default         1               2023-12-26 10:21:27.258526025 -0300 -03 deployed        raw-v0.3.2                 


hook[postsync] logs | 

UPDATED RELEASES:
NAME   CHART                                                                          VERSION   DURATION
raw    https://github.com/dysnix/charts/releases/download/raw-v0.3.2/raw-v0.3.2.tgz   v0.3.2          2s

in ./helmfile.yaml: failed processing release raw: hook[first-hook]: command `false` failed: command "/usr/bin/false" exited with non-zero status:

PATH:
  /usr/bin/false

ARGS:
  0: false (5 bytes)

ERROR:
  exit status 1

EXIT STATUS
  1

Steps to reproduce

See helmfile.yaml above

Working Helmfile Version

N/A

Relevant discussion

No response

@felipecrs In Helmfile, each hook is a self-contained entity, and there is no communication between multiple hooks. If you need to handle errors, it should be done within a single hook. Are there any other real-world scenarios where errors need to be passed between hooks?

In my case, yes.

For all releases I have in my hemfile, I have a presync and a postsync hook that starts collecting some logs (pre) and finishes (post).

Something like:

releases:
  - name: raw
    chart: https://github.com/dysnix/charts/releases/download/raw-v0.3.2/raw-v0.3.2.tgz
    wait: true
    hooks:
      - name: start-logs-capture
        events:
          - presync
        command: ./start_logs_capture.sh
      - name: check-service
        events:
          - postsync
        showlogs: true
        command: "echo"
        args:
          - "this commands checks if service is up, fails otherwise"
      - name: stop-logs-capture
        events:
          - postsync
        command: ./stop_logs_capture.sh

And multiply that pattern to a lot of releases. I even have template releases that automatically adds such start-logs-capture and stop-logs-capture hook to all releases in my helmfile.

However, as you can probably tell, if the "check-service" command fails, the "stop-logs-capture" hook would never execute. And that's not good. :(

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

I still think this is valid.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Still alive, I can share my workaround if someone wants to know. But it's a little complicated.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Not stale

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

xeor commented

@felipecrs mind sharing your workaround?

Oh. Yes. It's kinda cumbersome. I have a trap EXIT in all hooks and ensure to write /tmp/failure and exit 0.

Then, I make sure the last hook always checks for /tmp/failure and if present exit 1.

I use helmfile in a container so /tmp is isolated.

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

Still valid.

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.