containers/conmon

conmon regression in 2.0.24 causing Kubernetes exec probes in cri-o to fail

paulnivin opened this issue · 10 comments

conmon stopped working in Lyft's environment (Fedora 32 w/ cgroupsV1, Kubernetes 1.16, cri-o tip of 1.16 branch w/ cgroupfs) when #222 landed. Specifically, Kubernetes exec probes now always error out.

Running conmon at 0f092d5 results in all exec probes failing with:

remote_runtime.go:351] ExecSync [...] from runtime service failed: rpc error: code = Unknown desc = command error: EOF, stdout: , stderr: , exit code -1

Building and running conmon at the immediately prior commit of 43377e3 resolves the issue. I've additionally tested current conmon master at 3ac015e and this commit does not fix this issue. I started looking into conmon changes due to kata-containers/runtime#2352 which had similar error messages.

Historically we haven't pinned conmon and run the version that ships in Fedora. If it's helpful, it looks like the current master branch of cri-o is building against conmon 2.0.20. Let us know if any additional data would be useful.

uh oh! can you try with #241? I believe it should fix your problem

cri-o has lagged quite a bit, because most of the version churn has been for podman. I will look into bumping conmon in cri-o once we package that PR

Unfortunately, #241 does not resolve the issue. Exec probes with 3ac015e error out in the same manner as with 0f092d5.

I think #246 may help as well, I was able to reproduce the error and that fixed it

Confirmed that #246 fixes the issue. Thanks!

awesome, I'll have 2.0.27 available shortly, probably will make its way into fedora in a couple of days.

In the meantime, we'll test cri-o with conmon master and test conmon with cri-o integration tests (two things I've been meaning to do anyway). that should catch the majority of issues like this earlier.

heads up that koji doesn't yet have a fc32 build

oops, thanks, I triggered the build

@haircommander I believe this can be closed, right?

Yes, closing this one for now.