kubeshop/testkube

v2.1.13 Artifact upload `broken pipe` issues

Closed this issue · 8 comments

Describe the bug
With the latest release 2.1.13 our testworkflows fail on artifact upload with the following message.

The process has been corrupted: signal: broken pipe
It may be caused by lack of resources on node (i.e. memory or disk space), or external issues.

This issue is intermittent, sometimes it manages to save it, most of the time it fails.

To Reproduce

  1. Run test workflow with multiple artifacts to save
  2. Repeat the run several times - it usually works the first time
  3. Observe error being raised on artifact upload

Version / Cluster

  • Which testkube version?
    • 2.1.13
  • What Kubernetes cluster?
    • EKS
  • What Kubernetes version?
    • 1.29.3

Screenshots

❯ kubectl testkube get twe 66daf3e3998d551300b626de
<snip>
• (2/2) Upload artifacts
Root: /data
Patterns:
- /data/**/*

artifact_a (7 B)
subfolder/artifact_b (7 B)


The process has been corrupted: signal: broken pipe
It may be caused by lack of resources on node (i.e. memory or disk space), or external issues.

• aborted

Hi @david-polak-ataccama ! Sorry for the issue. Could you provide some details:

  • Is this Agent using Control Plane? (Cloud or On-Premise with organizations/environments)
  • What do you use as an Object Storage? (Minio, S3)
  • What is the approximate size and count of the artifacts do you write?

Looking at the code, there were lately no specific changes that could cause the broken pipe while saving the artifacts.

The only difference there is, is that in case of broken pipe in previous versions that step was marked as failed, now it's marked as aborted with this message.

  • What version did you upgrade from?
  • Do you have limited resources for this workflow/step? Could you try increasing them?
  • Did you have, on the previous version, artifacts step marked from time to time as failed?

Is this Agent using Control Plane? (Cloud or On-Premise with organizations/environments)
No

* What do you use as an Object Storage? (Minio, S3)

We are using MinIO

* What is the approximate size and count of the artifacts do you write?

These are our smoke tests that are failing, so not many and they're not large

artifact_a (7 B)
subfolder/artifact_b (7 B)
* Do you have limited resources for this workflow/step? Could you try increasing them?

Yes we have limits, bumped them up to 512MiB and 1000mcpu with no difference.

* What version did you upgrade from?

testkube-chart 2.1.0 -> 2.1.18

* Did you have, on the previous version, artifacts step marked from time to time as `failed`?

No, we were running on the chart version 2.1.0 for a while and observed no issues with saving artifacts.

I tried one interim version of Testkube and that one also fails, albeit with a different message, seems that something has changed between testkube version 2.0.20 and 2.1.9.

I hope that helps, I might be able to do further bisecting as required.

Attaching logs and the workflow.

The workflow

kind: TestWorkflow
apiVersion: testworkflows.testkube.io/v1
metadata:
  name: mocktests-pytest
  namespace: testkube
spec:
  container:
    resources:
      limits:
        cpu: 1000m
        memory: 512Mi
  steps:
    - name: pytest
      run:
        shell: |
          mkdir -p /data/subfolder
          echo "file a" > /data/artifact_a
          echo "file b" > /data/subfolder/artifact_b
      artifacts:
        workingDir: /data
        paths:
          - '**/*'

Server Version 2.0.20

❮ testkube run tw mocktests-pytest --watch

Context:  (2.1.6)   Namespace: testkube
---------------------------------------
Test Workflow Execution:
Name:                 mocktests-pytest
Execution ID:         66db0742c6546eeadb7e5873
Execution name:       mocktests-pytest-29
Execution namespace:  testkube
Execution number:     29
Requested at:         2024-09-06 13:44:34.190077928 +0000 UTC
Disabled webhooks:    false
Status:               queued

Getting logs from test workflow job 66db0742c6546eeadb7e5873

• Initializing
Configuring state...
Configuring init process...
Configuring shell...

• passed in 453ms

• passed in 682ms
Root: /data
Patterns:
- /data/**/*

artifact_a (7 B)
subfolder/artifact_b (7 B)

Found and uploaded 2 files (14 B).
Took 32ms.

• passed in 987ms

test workflow execution completed with success in 3.81s 🥇

$ Use following command to get test workflow execution details \
        kubectl testkube get twe 66db0742c6546eeadb7e5873

Server Version 2.1.9

❯ testkube run tw mocktests-pytest --watch
Context:  (2.1.6)   Namespace: testkube
---------------------------------------
Test Workflow Execution:
Name:                 mocktests-pytest
Execution ID:         66db07f70c00765fe50551e5
Execution name:       mocktests-pytest-30
Execution namespace:  testkube
Execution number:     30
Requested at:         2024-09-06 13:47:35.096873512 +0000 UTC
Disabled webhooks:    false
Status:               queued

Getting logs from test workflow job 66db07f70c00765fe50551e5
Creating state... done
Initializing state... done
Configuring init process... skipped
Configuring toolkit... done
Configuring shell... skipped

• passed in 944ms
Root: /data
Patterns:
- /data/**/*2024-09-06T13:47:37.003897013Z artifact_a (7 B)
subfolder/artifact_b (7 B)

• failed in 70ms

test workflow execution failed

$ Use following command to get test workflow execution details \
        kubectl testkube get twe 66db07f70c00765fe50551e5

Server Version 2.1.13

❯ testkube run tw mocktests-pytest --watch

Context:  (2.1.6)   Namespace: testkube
---------------------------------------
Test Workflow Execution:
Name:                 mocktests-pytest
Execution ID:         66db08994b1040b36744b73e
Execution name:       mocktests-pytest-31
Execution namespace:  testkube
Execution number:     31
Requested at:         2024-09-06 13:50:17.263745295 +0000 UTC
Disabled webhooks:    false
Status:               queued

Getting logs from test workflow job 66db08994b1040b36744b73e
(SuccessfulCreate) Created pod: 66db08994b1040b36744b73e-w2qwl
(Scheduled) Successfully assigned testkube/66db08994b1040b36744b73e-w2qwl to k3d-default-dev-xlocal-agent-1
(Pulled) Container image "docker.io/kubeshop/testkube-tw-toolkit:2.1.13" already present on machine
Creating state... done
Initializing state... done
Configuring init process... skipped
Configuring toolkit... skipped
Configuring shell... skipped

• passed in 1.159s

• (1/2) Run shell command

• passed in 5ms

• (2/2) Upload artifacts
Root: /data
Patterns:
- /data/**/*2024-09-06T13:50:19.264585879Z artifact_a (7 B)
subfolder/artifact_b (7 B)

The process has been corrupted: signal: broken pipe
It may be caused by lack of resources on node (i.e. memory or disk space), or external issues.

• aborted

test workflow execution aborted

$ Use following command to get test workflow execution details \
        kubectl testkube get twe 66db08994b1040b36744b73e

Thank you, sorry for the issue - I've been able to reproduce it. I'll try to find the solution today.

Thank you, sorry for the issue - I've been able to reproduce it. I'll try to find the solution today.

Lovely, thank you and have a nice weekend!

The bug should be fixed in Testkube Agent v2.1.15, which should be released shortly. Enjoy your weekend, and once again sorry for the problem! 👍

released. reopen if it didn'tt help