Executors not spinning up successfully: Unknown host for jenkins-ui.jenkins.svc.cluster.local

Question

Executors not spinning up successfully: Unknown host for jenkins-ui.jenkins.svc.cluster.local

tobsch opened this issue 7 years ago · 9 comments

Hi there,

I set up a relatively basic k8s jenkins installation using the howto guide.
It's all working fine, I only have problems with the executors not really spinning up.
Following output in the pods:


jnlp | 2017-11-24T16:32:47.268129607Z |  
-- | -- | --
jnlp | 2017-11-24T16:32:47.268127569Z | ... 2 more
jnlp | 2017-11-24T16:32:47.268125423Z | at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:158)
jnlp | 2017-11-24T16:32:47.268123183Z | at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:966)
jnlp | 2017-11-24T16:32:47.268121053Z | at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1032)
jnlp | 2017-11-24T16:32:47.268118150Z | at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1138)
jnlp | 2017-11-24T16:32:47.268104092Z | at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1202)
jnlp | 2017-11-24T16:32:47.268101937Z | at sun.net.www.http.HttpClient.New(HttpClient.java:357)
jnlp | 2017-11-24T16:32:47.268099912Z | at sun.net.www.http.HttpClient.New(HttpClient.java:339)
jnlp | 2017-11-24T16:32:47.268097429Z | at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
jnlp | 2017-11-24T16:32:47.268095293Z | at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
jnlp | 2017-11-24T16:32:47.268092845Z | at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
jnlp | 2017-11-24T16:32:47.268084923Z | at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
jnlp | 2017-11-24T16:32:47.268082800Z | at java.net.Socket.connect(Socket.java:589)
jnlp | 2017-11-24T16:32:47.268080662Z | at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
jnlp | 2017-11-24T16:32:47.268078609Z | at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
jnlp | 2017-11-24T16:32:47.268076247Z | Caused by: java.net.UnknownHostException: jenkins-ui.jenkins.svc.cluster.local
jnlp | 2017-11-24T16:32:47.268073853Z | at hudson.remoting.Engine.run(Engine.java:447)
jnlp | 2017-11-24T16:32:47.268071457Z | at hudson.remoting.Engine.innerRun(Engine.java:495)
jnlp | 2017-11-24T16:32:47.268068627Z | at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:161)
jnlp | 2017-11-24T16:32:47.268063800Z | java.io.IOException: Failed to connect to http://jenkins-ui.jenkins.svc.cluster.local:8080/tcpSlaveAgentListener/: jenkins-ui.jenkins.svc.cluster.local
jnlp | 2017-11-24T16:32:47.268060303Z | SEVERE: Failed to connect to http://jenkins-ui.jenkins.svc.cluster.local:8080/tcpSlaveAgentListener/: jenkins-ui.jenkins.svc.cluster.local
jnlp | 2017-11-24T16:32:47.268028122Z | Nov 24, 2017 4:32:47 PM hudson.remoting.jnlp.Main$CuiListener error
jnlp | 2017-11-24T16:32:27.117789622Z | INFO: Locating server among [http://jenkins-ui.jenkins.svc.cluster.local:8080/]
jnlp | 2017-11-24T16:32:27.117739627Z | Nov 24, 2017 4:32:27 PM hudson.remoting.jnlp.Main$CuiListener status
jnlp | 2017-11-24T16:32:26.962365373Z | WARNING: No Working Directory. Using the legacy JAR Cache location: /root/.jenkins/cache/jars
jnlp | 2017-11-24T16:32:26.962338983Z | Nov 24, 2017 4:32:26 PM hudson.remoting.Engine startEngine
jnlp | 2017-11-24T16:32:26.956628981Z | INFO: Jenkins agent is running in headless mode.
jnlp | 2017-11-24T16:32:26.956588789Z | Nov 24, 2017 4:32:26 PM hudson.remoting.jnlp.Main$CuiListener <init>
jnlp | 2017-11-24T16:32:26.948584652Z | INFO: Setting up slave: jnlp-ljlvs
jnlp | 2017-11-24T16:32:26.948522265Z | Nov 24, 2017 4:32:26 PM hudson.remoting.jnlp.Main createEngine
jnlp | 2017-11-24T16:32:26.323005364Z | Warning: AGENT_NAME is defined twice in command-line arguments and the environment variable
jnlp | 2017-11-24T16:32:26.323001581Z | Warning: SECRET is defined twice in command-line arguments and the environment variable
jnlp | 2017-11-24T16:32:26.322932987Z | Warning: JnlpProtocol3 is disabled by default, use JNLP_PROTOCOL_OPTS to alter the behavior

I assume that " Failed to connect to "... is the key problem.
I opened a bash shell on the master pod and did
curl http://jenkins-ui.jenkins.svc.cluster.local:8080/tcpSlaveAgentListener/
and got back "Jenkins".

So it seems to be working?
Did I miss anything?

Update: Okay, it can't resolve the hostname on the jnlp nodes. But Why?
It also seems to work in around 10% of the cases. Puzzled.

Tobias

Answer 1 · 2017-12-04T16:37:06.000Z

we are seeing something very similar atm, our jenkins deployment suddenly regressed and started showing this error on every slave start.

have you been able to solve it, @tobsch?

Answer 2 · 2017-12-04T16:58:52.000Z

I gave up on this for now. But I had similar problems with "connecting to the outside world".
What solved this on those cases was running the container as root.
You could try that?

Funny but it worked.

Answer 3 · 2017-12-04T17:02:56.000Z

right ok, another amusing thing is that this all of a sudden appeared first with a couple of builds and then with all builds… maybe this is the time to seriously look into replacing Jenkins :)

for the record — filed https://issues.jenkins-ci.org/browse/JENKINS-48368 upstream

Answer 4 · 2017-12-04T17:06:18.000Z

I guess it's rather a GCP issue?

Answer 5 · 2018-01-05T22:32:26.000Z

I had a similar issue, because I did not deploy the server into the jenkins namespace, I Put it in default
I had to go to manage Jenkins->Configure System and change Jenkins URL and jenkins tunnel to use the new namespace in the dns names (they are way at the bottom:
http://jenkins-ui.default.svc.cluster.local:8080
jenkins-discovery.default.svc.cluster.local:50000

Answer 6 · 2018-05-30T16:25:32.000Z

I managed to get this working by changing the master to have 1 executor, then it would properly spin up slaves.

(Manage Jenkins > Configure System > # of executors)

Answer 7 · 2018-06-26T11:44:32.000Z

I don't think this helps the original issue, but I had a similar problem that led me here - in my case I had configured Jenkins to run in the /jenkins/ folder so needed to modify the Jenkins URL to include that e.g. http://jenkins-ui.jenkins.svc.cluster.local:8080/jenkins

Answer 8 · 2018-07-19T23:39:39.000Z

Thanks folks and apologies for these issues. We are now using the upstream Helm chart which should have resolved these issues and/or provides customization options for these config changes:
https://github.com/helm/charts/blob/master/stable/jenkins/values.yaml

Answer 9 · 2018-08-10T23:12:54.000Z

I am still running into this issue with a near stock Jenkins helm install on K8s 1.11. @viglesiasce What was the required change?