knative/serving

Many services in a single namespace leads to assorted problems.

mattmoor opened this issue · 12 comments

/area API
/area autoscale

What version of Knative?

HEAD

Description of the problem.

I wanted to push on our limits a bit, and so I wrote the very innovative (patent pending 🤣 ) script below. I plotted the latency between creationTimestamp and status.conditions[Ready].lastTransitionTime here.

A few observations in this context:

  1. Revision creation latency creeps up over time.
  2. (anecdotally by curling) the cold start latency of ALL ksvcs creeps up over time (from 2-3s to 10-12s, with all time being spent between "Container create" and "Container start")
  3. After 1198 services were deployed, I started seeing deployments fail with:
RevisionFailed: Revision "foo-1200-sjwyy-1" failed with message: Container failed with: standard_init_linux.go:211: exec user process caused "argument list too long
  1. When the above happened, we stop being able to cold start new services (they crash loop with the same message)!

Here's where it gets interesting... On a whim, I tried picking back up in a second namespace, and things work! Not only do they work, but cold start latency for the new services is back down!

Steps to Reproduce the Problem

I needed a GKE cluster with at least 10 nodes (post-master resize) to tolerate the number of services this creates. I was playing with this in the context of mink, but there's no reason that would affect what I'm seeing.

#!/bin/bash -e

for i in $(seq 1 1500); do
  kn service create foo-$i --image=gcr.io/knative-samples/autoscale-go:0.1
  sleep 10
done

I gathered the latencies as CSV with:

kubectl get ksvc -ojsonpath='{range .items[*]}{.metadata.name},{.metadata.creationTimestamp},{.status.conditions[?(@.type=="Ready")].lastTransitionTime}{"\n"}{end}' | pbcopy

Given the sheer volume of services we create, I have the sneaking suspicion that this is related to the service link environment variables injected by k8s. @dprotaso has a change to allow us to use enableServiceLinks: false here.

I am going to try pulling that in and see if it alleviates these problems. If it does, then I think we have our compelling reason to make this default configurable.

cc @vagababov @markusthoemmes @duglin

Yeah Dave's change should help with this

tshak commented

enableServiceLinks: false still feels like the right default to me, especially if it's causing issues like this. I really can't imagine anyone using knative relying on Docker links behavior. If they do we have a switch now where it can be enabled in the PodSpec. Lest us not forget...

Built by codifying the best practices shared by successful real-world implementations...

@tshak Ack, when this was raised previously we didn't really have tangible evidence that this could cause correctness or performance problems at scale. I am running a test now that should confirm/deny that impact, and then we can revisit in the WG call. However, I think the sentiment from the previous WG call stands: we need to handle this like a change to "stable" behavior.

Assuming we have a smoking gun...

I'd suggest:

  1. Make the default configurable,
  2. Announce the intent to change the default (and why), and advise folks interested to specify an explicit default (and how).
  3. Wait until the release specified in 2.
  4. Change the default and advise folks how to restore the previous behavior.

I don't see a super compelling reason to remove the configurable default (since it enables "K8s compat mode"), but I think these issues would be enough to sway my view on the default behavior.

I'll report back in a few hundred more services 😉

Alright, I still need to confirm that we can push beyond 1200 services without the standard_init_linux.go problem, but I think we have enough to declare a smoking gun.

This is the before picture (effectively enableServiceLinks: true):
image

This is the after picture (enableServiceLinks: false):
image

On top of that, not only is the cold-start latency for services better, but it seems like it is a noticeable improvement over where we are at HEAD today with nothing on the cluster 🤩

time curl http://foo-xdsps.default.ducktagon.io?sleep=100
Slept for 100.13 milliseconds.
real    0m2.016s
user    0m0.007s
sys     0m0.011s


time curl http://foo-z74zf.default.ducktagon.io?sleep=100
Slept for 100.25 milliseconds.
real    0m1.895s
user    0m0.008s
sys     0m0.009s

I deployed a ksvc with the original method on top of these 1000 ksvcs (it was slow!):

kn service create foo-svc-links --image=gcr.io/knative-samples/autoscale-go:0.1
Creating service 'foo-svc-links' in namespace 'default':

  0.119s Configuration "foo-svc-links" is waiting for a Revision to become ready.
  0.131s unsuccessfully observed a new generation
  0.154s Configuration "foo-svc-links" is waiting for a Revision to become ready.
 22.734s ...
 22.861s Ingress has not yet been reconciled.
 23.549s Waiting for load balancer to be ready
 25.077s Ready to serve.

Service 'foo-svc-links' created with latest revision 'foo-svc-links-fxfmm-1' and URL:
http://foo-svc-links.default.ducktagon.io

I waited for this to scale to zero:

time curl http://foo-svc-links.default.ducktagon.io?sleep=100
Slept for 100.14 milliseconds.

real    0m13.042s
user    0m0.008s
sys     0m0.009s

My next step is to keep cranking out services and see if I can push past 1200 (where foo-svc-links should fail) and see if the ksvcs without service links keep working.

Alright, I have confirmed the last piece of this. Above 1200 services, we fail to cold start with service links, but without we still see pretty exceptional cold starts...

time curl http://foo-z9gs2.default.ducktagon.io?sleep=100
Slept for 100.12 milliseconds.

real    0m1.698s
user    0m0.008s
sys     0m0.009s

Overnight I did a fresh run creating 1500 ksvc (with kn as above, but with service links disabled by default thanks to @vagababov), and the Revision creation latency is completely flat:
image

... bear in mind that the original graph was only 1000 ksvc.

So I think we should start a discussion around how we want to change this default behavior. Thoughts?

Closing in favor of #8563

@mattmoor do you happen to remember if this issue was strictly due to K8s Services env vars being injected, or whether it would have happened even if the user defined the same number of env vars manually?

@duglin I'd guess it'd happen with either

ok thanks