v2.1.13 Artifact upload `broken pipe` issues
Closed this issue · 8 comments
Describe the bug
With the latest release 2.1.13 our testworkflows fail on artifact upload with the following message.
The process has been corrupted: signal: broken pipe
It may be caused by lack of resources on node (i.e. memory or disk space), or external issues.
This issue is intermittent, sometimes it manages to save it, most of the time it fails.
To Reproduce
- Run test workflow with multiple artifacts to save
- Repeat the run several times - it usually works the first time
- Observe error being raised on artifact upload
Version / Cluster
- Which testkube version?
- 2.1.13
- What Kubernetes cluster?
- EKS
- What Kubernetes version?
- 1.29.3
Screenshots
❯ kubectl testkube get twe 66daf3e3998d551300b626de
<snip>
• (2/2) Upload artifacts
Root: /data
Patterns:
- /data/**/*
artifact_a (7 B)
subfolder/artifact_b (7 B)
The process has been corrupted: signal: broken pipe
It may be caused by lack of resources on node (i.e. memory or disk space), or external issues.
• aborted
Hi @david-polak-ataccama ! Sorry for the issue. Could you provide some details:
- Is this Agent using Control Plane? (Cloud or On-Premise with organizations/environments)
- What do you use as an Object Storage? (Minio, S3)
- What is the approximate size and count of the artifacts do you write?
Looking at the code, there were lately no specific changes that could cause the broken pipe while saving the artifacts.
The only difference there is, is that in case of broken pipe
in previous versions that step was marked as failed
, now it's marked as aborted
with this message.
- What version did you upgrade from?
- Do you have limited resources for this workflow/step? Could you try increasing them?
- Did you have, on the previous version, artifacts step marked from time to time as
failed
?
Is this Agent using Control Plane? (Cloud or On-Premise with organizations/environments)
No
* What do you use as an Object Storage? (Minio, S3)
We are using MinIO
* What is the approximate size and count of the artifacts do you write?
These are our smoke tests that are failing, so not many and they're not large
artifact_a (7 B)
subfolder/artifact_b (7 B)
* Do you have limited resources for this workflow/step? Could you try increasing them?
Yes we have limits, bumped them up to 512MiB and 1000mcpu with no difference.
* What version did you upgrade from?
testkube-chart 2.1.0 -> 2.1.18
* Did you have, on the previous version, artifacts step marked from time to time as `failed`?
No, we were running on the chart version 2.1.0 for a while and observed no issues with saving artifacts.
I tried one interim version of Testkube and that one also fails, albeit with a different message, seems that something has changed between testkube version 2.0.20 and 2.1.9.
I hope that helps, I might be able to do further bisecting as required.
Attaching logs and the workflow.
The workflow
kind: TestWorkflow
apiVersion: testworkflows.testkube.io/v1
metadata:
name: mocktests-pytest
namespace: testkube
spec:
container:
resources:
limits:
cpu: 1000m
memory: 512Mi
steps:
- name: pytest
run:
shell: |
mkdir -p /data/subfolder
echo "file a" > /data/artifact_a
echo "file b" > /data/subfolder/artifact_b
artifacts:
workingDir: /data
paths:
- '**/*'
Server Version 2.0.20
❮ testkube run tw mocktests-pytest --watch
Context: (2.1.6) Namespace: testkube
---------------------------------------
Test Workflow Execution:
Name: mocktests-pytest
Execution ID: 66db0742c6546eeadb7e5873
Execution name: mocktests-pytest-29
Execution namespace: testkube
Execution number: 29
Requested at: 2024-09-06 13:44:34.190077928 +0000 UTC
Disabled webhooks: false
Status: queued
Getting logs from test workflow job 66db0742c6546eeadb7e5873
• Initializing
Configuring state...
Configuring init process...
Configuring shell...
• passed in 453ms
• passed in 682ms
Root: /data
Patterns:
- /data/**/*
artifact_a (7 B)
subfolder/artifact_b (7 B)
Found and uploaded 2 files (14 B).
Took 32ms.
• passed in 987ms
test workflow execution completed with success in 3.81s 🥇
$ Use following command to get test workflow execution details \
kubectl testkube get twe 66db0742c6546eeadb7e5873
Server Version 2.1.9
❯ testkube run tw mocktests-pytest --watch
Context: (2.1.6) Namespace: testkube
---------------------------------------
Test Workflow Execution:
Name: mocktests-pytest
Execution ID: 66db07f70c00765fe50551e5
Execution name: mocktests-pytest-30
Execution namespace: testkube
Execution number: 30
Requested at: 2024-09-06 13:47:35.096873512 +0000 UTC
Disabled webhooks: false
Status: queued
Getting logs from test workflow job 66db07f70c00765fe50551e5
Creating state... done
Initializing state... done
Configuring init process... skipped
Configuring toolkit... done
Configuring shell... skipped
• passed in 944ms
Root: /data
Patterns:
- /data/**/*2024-09-06T13:47:37.003897013Z artifact_a (7 B)
subfolder/artifact_b (7 B)
• failed in 70ms
test workflow execution failed
$ Use following command to get test workflow execution details \
kubectl testkube get twe 66db07f70c00765fe50551e5
Server Version 2.1.13
❯ testkube run tw mocktests-pytest --watch
Context: (2.1.6) Namespace: testkube
---------------------------------------
Test Workflow Execution:
Name: mocktests-pytest
Execution ID: 66db08994b1040b36744b73e
Execution name: mocktests-pytest-31
Execution namespace: testkube
Execution number: 31
Requested at: 2024-09-06 13:50:17.263745295 +0000 UTC
Disabled webhooks: false
Status: queued
Getting logs from test workflow job 66db08994b1040b36744b73e
(SuccessfulCreate) Created pod: 66db08994b1040b36744b73e-w2qwl
(Scheduled) Successfully assigned testkube/66db08994b1040b36744b73e-w2qwl to k3d-default-dev-xlocal-agent-1
(Pulled) Container image "docker.io/kubeshop/testkube-tw-toolkit:2.1.13" already present on machine
Creating state... done
Initializing state... done
Configuring init process... skipped
Configuring toolkit... skipped
Configuring shell... skipped
• passed in 1.159s
• (1/2) Run shell command
• passed in 5ms
• (2/2) Upload artifacts
Root: /data
Patterns:
- /data/**/*2024-09-06T13:50:19.264585879Z artifact_a (7 B)
subfolder/artifact_b (7 B)
The process has been corrupted: signal: broken pipe
It may be caused by lack of resources on node (i.e. memory or disk space), or external issues.
• aborted
test workflow execution aborted
$ Use following command to get test workflow execution details \
kubectl testkube get twe 66db08994b1040b36744b73e
Thank you, sorry for the issue - I've been able to reproduce it. I'll try to find the solution today.
Thank you, sorry for the issue - I've been able to reproduce it. I'll try to find the solution today.
Lovely, thank you and have a nice weekend!
The bug should be fixed in Testkube Agent v2.1.15, which should be released shortly. Enjoy your weekend, and once again sorry for the problem! 👍
released. reopen if it didn'tt help