iterative/cml

Socket hang up when publishing assets

CptCaptain opened this issue · 4 comments

Hi everyone,

since 2023-08-31, I consistently get this error creating the report in the final step of my workflow.
From the error, I guess this happens on a cml asset publish call. I only publish a few fairly small images.

***"code":"ECONNRESET","errno":"ECONNRESET","level":"error","message":"request to https://asset.cml.dev/ failed, reason: socket hang up","stack":"FetchError: request to https://asset.cml.dev/ failed, reason: socket hang up\n    at ClientRequest.<anonymous> (/__w/_tool/node/16.20.2/x64/lib/node_modules/@dvcorg/cml/node_modules/node-fetch/lib/index.js:1501:11)\n    at ClientRequest.emit (node:events:525:35)\n    at TLSSocket.socketOnEnd (node:_http_client:518:9)\n    at TLSSocket.emit (node:events:525:35)\n    at endReadableNT (node:internal/streams/readable:1358:12)\n    at processTicksAndRejections (node:internal/process/task_queues:83:21)","type":"system"***

I use a github workflow with self-hosted runners on GCP.

Hello, @CptCaptain! Can you reproduce it with curl instead of cml on the same environment?

curl https://asset.cml.dev\
 --header content-type:text/plain\
 --header content-disposition:inline\
 --data-binary @file.txt\
 --verbose

No, the curl was successful. But I've only uploaded a small text file.
Should I retry with an image? Or is that unlikely to affect it?

@CptCaptain, please try with some of your images, and also again with CML; it might be a transient error.

It seems like it wasn't transient per se, but indeed a fault on my end. The directory structure of the dvclive folder changed and my workflow tried to upload directories, which caused the errors. I've now handled that case and it seems to work fine again. Thanks for your help nevertheless!