GoogleCloudPlatform/continuous-deployment-on-kubernetes

Executors not spinning up successfully: Unknown host for jenkins-ui.jenkins.svc.cluster.local

tobsch opened this issue · 9 comments

Hi there,

I set up a relatively basic k8s jenkins installation using the howto guide.
It's all working fine, I only have problems with the executors not really spinning up.
Following output in the pods:


jnlp | 2017-11-24T16:32:47.268129607Z |  
-- | -- | --
jnlp | 2017-11-24T16:32:47.268127569Z | ... 2 more
jnlp | 2017-11-24T16:32:47.268125423Z | at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:158)
jnlp | 2017-11-24T16:32:47.268123183Z | at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)
jnlp | 2017-11-24T16:32:47.268121053Z | at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
jnlp | 2017-11-24T16:32:47.268118150Z | at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
jnlp | 2017-11-24T16:32:47.268104092Z | at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202)
jnlp | 2017-11-24T16:32:47.268101937Z | at sun.net.www.http.HttpClient.New(HttpClient.java:357)
jnlp | 2017-11-24T16:32:47.268099912Z | at sun.net.www.http.HttpClient.New(HttpClient.java:339)
jnlp | 2017-11-24T16:32:47.268097429Z | at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
jnlp | 2017-11-24T16:32:47.268095293Z | at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
jnlp | 2017-11-24T16:32:47.268092845Z | at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
jnlp | 2017-11-24T16:32:47.268084923Z | at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
jnlp | 2017-11-24T16:32:47.268082800Z | at java.net.Socket.connect(Socket.java:589)
jnlp | 2017-11-24T16:32:47.268080662Z | at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
jnlp | 2017-11-24T16:32:47.268078609Z | at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
jnlp | 2017-11-24T16:32:47.268076247Z | Caused by: java.net.UnknownHostException: jenkins-ui.jenkins.svc.cluster.local
jnlp | 2017-11-24T16:32:47.268073853Z | at hudson.remoting.Engine.run(Engine.java:447)
jnlp | 2017-11-24T16:32:47.268071457Z | at hudson.remoting.Engine.innerRun(Engine.java:495)
jnlp | 2017-11-24T16:32:47.268068627Z | at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:161)
jnlp | 2017-11-24T16:32:47.268063800Z | java.io.IOException: Failed to connect to http://jenkins-ui.jenkins.svc.cluster.local:8080/tcpSlaveAgentListener/: jenkins-ui.jenkins.svc.cluster.local
jnlp | 2017-11-24T16:32:47.268060303Z | SEVERE: Failed to connect to http://jenkins-ui.jenkins.svc.cluster.local:8080/tcpSlaveAgentListener/: jenkins-ui.jenkins.svc.cluster.local
jnlp | 2017-11-24T16:32:47.268028122Z | Nov 24, 2017 4:32:47 PM hudson.remoting.jnlp.Main$CuiListener error
jnlp | 2017-11-24T16:32:27.117789622Z | INFO: Locating server among [http://jenkins-ui.jenkins.svc.cluster.local:8080/]
jnlp | 2017-11-24T16:32:27.117739627Z | Nov 24, 2017 4:32:27 PM hudson.remoting.jnlp.Main$CuiListener status
jnlp | 2017-11-24T16:32:26.962365373Z | WARNING: No Working Directory. Using the legacy JAR Cache location: /root/.jenkins/cache/jars
jnlp | 2017-11-24T16:32:26.962338983Z | Nov 24, 2017 4:32:26 PM hudson.remoting.Engine startEngine
jnlp | 2017-11-24T16:32:26.956628981Z | INFO: Jenkins agent is running in headless mode.
jnlp | 2017-11-24T16:32:26.956588789Z | Nov 24, 2017 4:32:26 PM hudson.remoting.jnlp.Main$CuiListener <init>
jnlp | 2017-11-24T16:32:26.948584652Z | INFO: Setting up slave: jnlp-ljlvs
jnlp | 2017-11-24T16:32:26.948522265Z | Nov 24, 2017 4:32:26 PM hudson.remoting.jnlp.Main createEngine
jnlp | 2017-11-24T16:32:26.323005364Z | Warning: AGENT_NAME is defined twice in command-line arguments and the environment variable
jnlp | 2017-11-24T16:32:26.323001581Z | Warning: SECRET is defined twice in command-line arguments and the environment variable
jnlp | 2017-11-24T16:32:26.322932987Z | Warning: JnlpProtocol3 is disabled by default, use JNLP_PROTOCOL_OPTS to alter the behavior

I assume that " Failed to connect to "... is the key problem.
I opened a bash shell on the master pod and did
curl http://jenkins-ui.jenkins.svc.cluster.local:8080/tcpSlaveAgentListener/
and got back "Jenkins".

So it seems to be working?
Did I miss anything?

Update: Okay, it can't resolve the hostname on the jnlp nodes. But Why?
It also seems to work in around 10% of the cases. Puzzled.

Tobias

we are seeing something very similar atm, our jenkins deployment suddenly regressed and started showing this error on every slave start.

have you been able to solve it, @tobsch?

I gave up on this for now. But I had similar problems with "connecting to the outside world".
What solved this on those cases was running the container as root.
You could try that?

Funny but it worked.

right ok, another amusing thing is that this all of a sudden appeared first with a couple of builds and then with all builds… maybe this is the time to seriously look into replacing Jenkins :)

for the record — filed https://issues.jenkins-ci.org/browse/JENKINS-48368 upstream

I guess it's rather a GCP issue?

I had a similar issue, because I did not deploy the server into the jenkins namespace, I Put it in default
I had to go to manage Jenkins->Configure System and change Jenkins URL and jenkins tunnel to use the new namespace in the dns names (they are way at the bottom:
http://jenkins-ui.default.svc.cluster.local:8080
jenkins-discovery.default.svc.cluster.local:50000

I managed to get this working by changing the master to have 1 executor, then it would properly spin up slaves.

(Manage Jenkins > Configure System > # of executors)

I don't think this helps the original issue, but I had a similar problem that led me here - in my case I had configured Jenkins to run in the /jenkins/ folder so needed to modify the Jenkins URL to include that e.g. http://jenkins-ui.jenkins.svc.cluster.local:8080/jenkins

Thanks folks and apologies for these issues. We are now using the upstream Helm chart which should have resolved these issues and/or provides customization options for these config changes:
https://github.com/helm/charts/blob/master/stable/jenkins/values.yaml

I am still running into this issue with a near stock Jenkins helm install on K8s 1.11. @viglesiasce What was the required change?