porch: porch-server crash with "Unable to derive new concurrency limits"
arora-sagar opened this issue · 3 comments
Expected behavior
No crashing
Actual behavior
After testing the new image provided in Issue #3958 porch-server crashed after 2.5 days
. The crash issue was unrelated to the original issue. The lines which I find relevant in the logs were a couple of error lines and warnings:
E0814 05:08:35.283547 1 apf_controller.go:411] "Unable to derive new concurrency limits" err="impossible: ran out of bounds to consider in bound-constrained problem" plNames=[global-default leader-election node-high system workload-high workload-low catch-all] items=[{target:24 lowerBound:24 upperBound:649} {target:25 lowerBound:25 upperBound:625} {target:73 lowerBound:73 upperBound:698} {target:50 lowerBound:50 upperBound:674} {target:49 lowerBound:49 upperBound:698} {target:NaN lowerBound:26 upperBound:845} {target:13 lowerBound:13 upperBound:613}]
W0814 05:08:45.011028 1 repository.go:490] over-notifying of package updates (even on unchanged packages)
Information
Complete logs can be retrieved from here.
kpt version --> 1.0.0-beta.38
porch-server image version --> nephio/porch-server:jbelamaric (experimental)
@johnbelamaric can you please post your logs here?
Ok, so this error comes from the API Priority and Fairness feature in the core K8s API machinery code. I suspect it is something new we are seeing, since the experimental build upgraded from v0.26.0 to v0.26.7 of that library. I only found one other instance of this in a Google search, and it is recent, and it is using v0.26.6 of the apimachinery libraries:
I suspect this is not our bug but a core K8s bug; and I suspect the impact is relatively minimal, but we shall see.
Two more logs: apf-error.tar.gz
Note, this is now fixed upstream: kubernetes/kubernetes#120032
v0.26.9 was recently released but that fix was not backported. In fact I don't see it any any release yet.
There are other APF fixes in v0.26.9, but not this one.