allenporter/flux-local

Error with diff in github actions

Opened this issue · 14 comments

Hello!

This error sometimes appears for an unknown reason about 1 time per 10 starts.
I'm using version 5.2.0, but I observed this on version 5.1.0 as well.

DEBUG:flux_local.tool.visitor:Inflating Helm charts in cluster
DEBUG:flux_local.helm:Updating 1 repositories
DEBUG:flux_local.tool.visitor:Inflating Helm charts in cluster
DEBUG:flux_local.helm:Updating 1 repositories
DEBUG:flux_local.command:Running command: helm repo update --registry-config /dev/null --repository-cache /tmp/tmpbq657u26 --repository-config /tmp/tmpeo1pvo2y/repository-config.yaml
DEBUG:flux_local.command:Running command: helm repo update --registry-config /dev/null --repository-cache /tmp/tmpbq657u26 --repository-config /tmp/tmps5f_9gcs/repository-config.yaml
DEBUG:flux_local.tool.visitor:Waiting for inflate tasks to complete
DEBUG:flux_local.command:Running command: helm template metrics-server flux-system-bitnami/metrics-server --namespace monitoring --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 7.0.3 --values /tmp/tmps5f_9gcs/monitoring-metrics-server-values.yaml --registry-config /dev/null --repository-cache /tmp/tmpbq657u26 --repository-config /tmp/tmps5f_9gcs/repository-config.yaml
DEBUG:flux_local.command:Command 'helm template metrics-server flux-system-bitnami/metrics-server --namespace monitoring --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 7.0.3 --values /tmp/tmps5f_9gcs/monitoring-metrics-server-values.yaml --registry-config /dev/null --repository-cache /tmp/tmpbq657u26 --repository-config /tmp/tmps5f_9gcs/repository-config.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpbq657u26/flux-system-bitnami-index.yaml: empty index.yaml file

Traceback (most recent call last):
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/tool/flux_local.py", line 61, in main
    asyncio.run(action.run(**vars(args)))
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/tool/diff.py", line 414, in run
    await asyncio.gather(
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/tool/visitor.py", line 309, in inflate
    await asyncio.gather(*tasks)
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/tool/visitor.py", line 237, in inflate_release
    await visitor.func(pathlib.Path(""), release, cmd)
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/tool/visitor.py", line 197, in call_async
    objects = await cmd.objects()
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/kustomize.py", line 128, in objects
    return [doc async for doc in self._docs(target_namespace=target_namespace)]
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/kustomize.py", line 128, in <listcomp>
    return [doc async for doc in self._docs(target_namespace=target_namespace)]
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/kustomize.py", line 118, in _docs
    out = await self.run()
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/kustomize.py", line 112, in run
    return await run_piped(self._cmds)
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/command.py", line 120, in run_piped
    result = await _run_piped_with_sem(cmds)
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/command.py", line 110, in _run_piped_with_sem
    out = await asyncio.wait_for(cmd.run(stdin), _TIMEOUT)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/home/runner/work/_actions/allenporter/flux-local/5.2.0/flux_local/command.py", line 100, in run
    raise self.exc("\n".join(errors))
flux_local.exceptions.HelmException: Command 'helm template metrics-server flux-system-bitnami/metrics-server --namespace monitoring --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 7.0.3 --values /tmp/tmps5f_9gcs/monitoring-metrics-server-values.yaml --registry-config /dev/null --repository-cache /tmp/tmpbq657u26 --repository-config /tmp/tmps5f_9gcs/repository-config.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpbq657u26/flux-system-bitnami-index.yaml: empty index.yaml file

flux-local error:  Command 'helm template metrics-server flux-system-bitnami/metrics-server --namespace monitoring --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 7.0.3 --values /tmp/tmps5f_9gcs/monitoring-metrics-server-values.yaml --registry-config /dev/null --repository-cache /tmp/tmpbq657u26 --repository-config /tmp/tmps5f_9gcs/repository-config.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpbq657u26/flux-system-bitnami-index.yaml: empty index.yaml file

I wonder if perhaps this is specific to a certain version of helm. This seems similar to helm/helm#7600 where the helm command may not be resiliant to multiple instances running at once sometimes.

I have several jobs running in parallel to each other (via Github Actions matrixes).
And most likely they are executed on different hosts.

Can you try a newer version of helm and see if that helps?

Hello!
Thanks for the update!

I'll keep an eye on it, the fact is that on the previous version I encountered problems on average 1 time out of 10-15 launches.
If anything happens I will write here

@allenporter
Unfortunately the problem persists

DEBUG:flux_local.tool.visitor:Inflating Helm charts in cluster
DEBUG:flux_local.helm:Updating 1 repositories
DEBUG:flux_local.tool.visitor:Inflating Helm charts in cluster
DEBUG:flux_local.helm:Updating 1 repositories
DEBUG:flux_local.command:Running command: helm repo update --registry-config /dev/null --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml
DEBUG:flux_local.command:Running command: helm repo update --registry-config /dev/null --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmp5m55fxx_/repository-config.yaml
DEBUG:flux_local.tool.visitor:Waiting for inflate tasks to complete
DEBUG:flux_local.command:Running command: helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml
DEBUG:flux_local.command:Command 'helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpw73lrcdp/flux-system-bitnami-index.yaml: empty index.yaml file

WARNING:asyncio:Loop <_UnixSelectorEventLoop running=False closed=True debug=False> that handles pid 2381 is closed
Traceback (most recent call last):
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/tool/flux_local.py", line 61, in main
    asyncio.run(action.run(**vars(args)))
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/tool/diff.py", line 414, in run
    await asyncio.gather(
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/tool/visitor.py", line 309, in inflate
    await asyncio.gather(*tasks)
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/tool/visitor.py", line 237, in inflate_release
    await visitor.func(pathlib.Path(""), release, cmd)
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/tool/visitor.py", line 197, in call_async
    objects = await cmd.objects()
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/kustomize.py", line 131, in objects
    return [doc async for doc in self._docs(target_namespace=target_namespace)]
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/kustomize.py", line 131, in <listcomp>
    return [doc async for doc in self._docs(target_namespace=target_namespace)]
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/kustomize.py", line 120, in _docs
    out = await self.run()
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/kustomize.py", line 114, in run
    return await run_piped(self._cmds)
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/command.py", line 122, in run_piped
    result = await _run_piped_with_sem(cmds)
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/command.py", line 110, in _run_piped_with_sem
    out = await asyncio.wait_for(cmd.run(stdin), _TIMEOUT)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
    return fut.result()
  File "/home/runner/work/_actions/allenporter/flux-local/5.4.0/flux_local/command.py", line 100, in run
    raise self.exc("\n".join(errors))
flux_local.exceptions.HelmException: Command 'helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpw73lrcdp/flux-system-bitnami-index.yaml: empty index.yaml file

flux-local error:  Command 'helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpw73lrcdp/flux-system-bitnami-index.yaml: empty index.yaml file

Exception ignored in: <function BaseSubprocessTransport.__del__ at 0x7fd7689a2a70>
Traceback (most recent call last):
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_subprocess.py", line 126, in __del__
    self.close()
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_subprocess.py", line 104, in close
    proto.pipe.close()
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/unix_events.py", line 746, in close
    self.write_eof()
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/unix_events.py", line 732, in write_eof
    self._loop.call_soon(self._call_connection_lost, None)
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_events.py", line 753, in call_soon
    self._check_closed()
  File "/opt/hostedtoolcache/Python/3.10.14/x64/lib/python3.10/asyncio/base_events.py", line 515, in _check_closed

Hi, what version of helm are you using? Thanks!

Hello.

I have the latest version of Helm, but I don’t really understand why it’s here.
The diff is executed in the github runner and I do not pre-install anything into it.
Just using this code

name: "Flux Diff"

on:
  push:
    branches: ["renovate/*"]

concurrency:
  group: ${{ github.workflow }}-${{ github.event.number || github.ref }}
  cancel-in-progress: true

jobs:
  diffs:
    name: Compute diffs
    runs-on: ubuntu-22.04
    steps:
      - name: Setup Flux CLI
        uses: fluxcd/flux2/action@v2.3.0

      - uses: allenporter/flux-local/action/diff@5.4.0
        id: diff
        with:
          live-branch: develop
          path: clusters/path
          resource: helmrelease
          debug: true

      - name: PR Comments
        uses: mshick/add-pr-comment@v2
        if: ${{ steps.diff.outputs.diff != '' }}
        with:
          message-id: ${{ github.ref }}/flux-diff
          message-failure: Unable to post HelmRelease diff
          message: |
            `````diff
            ${{ steps.diff.outputs.diff }}
            `````

What's the "concurrency" about? does that run in parallel on the same filesystem .

Basically we can't have multiple processes clobbering the local filesystem. Flux build creates temp files that may be getting messed up if two run at once in the same directory.

To do multiple runs at once they may need their own file paths checked out.

All launches are performed in parallel, but they work in individual containers of GitHub runners and should not affect each other.
Screenshot 2024-07-07 at 14 04 47
Screenshot 2024-07-07 at 14 05 05

That's why I'm confused when I see duplicate logs

DEBUG:flux_local.tool.visitor:Inflating Helm charts in cluster
DEBUG:flux_local.helm:Updating 1 repositories
DEBUG:flux_local.tool.visitor:Inflating Helm charts in cluster
DEBUG:flux_local.helm:Updating 1 repositories
DEBUG:flux_local.command:Running command: helm repo update --registry-config /dev/null --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml
DEBUG:flux_local.command:Running command: helm repo update --registry-config /dev/null --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmp5m55fxx_/repository-config.yaml
DEBUG:flux_local.tool.visitor:Waiting for inflate tasks to complete
DEBUG:flux_local.command:Running command: helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml
DEBUG:flux_local.command:Command 'helm template external-dns flux-system-bitnami/external-dns --namespace external-dns --repository-cache /tmp/tmpw73lrcdp --repository-config /tmp/tmps6n4o81n/repository-config.yaml --registry-config /dev/null --skip-crds --skip-tests --api-versions policy/v1/PodDisruptionBudget --version 8.0.2 --values /tmp/tmps6n4o81n/external-dns-external-dns-values.yaml' failed with return code 1
Error: no cached repo found. (try 'helm repo update'): error loading /tmp/tmpw73lrcdp/flux-system-bitnami-index.yaml: empty index.yaml file

OK this still seems consistent with helms cache not working with multiple instances in parallel. People say the solution is to use a separate temporary directory for every instance. The reason for a shared repository cache is to avoid needing to pull the same repositories multiple times specially when running diffs (everything is loaded twice). We could workaround with a lock held on each repo as a hack but not a fan necessarily of that. Could also add more controls to tune helm concurrency.

I'd prefer if helm cli was fixed to be more resilient to running in parallel of course....

Need to think about this more.

Can you confirm the helm version used in your CI? The issue i linked upstream seems to be fixed.

Hi @allenporter ,
as you can see from this example I don't use a Helm in my pipes

I'm confused, before you said you were on the latest version of helm? Perhaps there has been a misunderstanding. Flux local does not manage your component versions of flux, helm, or customize and you're on your own for setting those up.

So what version of helm is being used here? I may need to close unless we have more detail.

I understand what you mean.
I don't use Helm for СI myself. But maybe it can be present on the GitHub runner by default.
I use the default runner with Ubuntu 24.

I found it here.
Helm 3.16.3