🐛Caused by: java.io.IOException: native connect() failed : No such file or directory
henrikmotzkus opened this issue · 10 comments
Describe the bug
Getting "Caused by: java.io.IOException: native connect() failed : No such file or directory" when I try to run a stack
Gaia on Azure AKS
To Reproduce
az login
az account set --subscription $subscriptionid
az group create --resource-group $resourcegroup --location $location
az aks create --resource-group $resourcegroup --name $clusterName --node-count 1 --enable-addons monitoring --generate-ssh-keys
az aks get-credentials --resource-group $resourcegroup --name $clusterName
kubectl apply -f https://raw.githubusercontent.com/henrikmotzkus/AutomationDemo/main/12_Terraform_Gaia/gaia.yaml
Expected behavior
no error
Screenshots
2022-01-06 17:29:09.074 INFO 1 --- [ scheduling-1] io.gaia_app.runner.StepPoller : Step found ffbd90c5-f8ff-4147-ace3-c93ea2cedd1e. Running.
2022-01-06 17:29:09.757 INFO 1 --- [ gaia-runner-1] io.gaia_app.runner.StepRunner : Starting step ffbd90c5-f8ff-4147-ace3-c93ea2cedd1e execution.
2022-01-06 17:29:10.554 ERROR 1 --- [tream-757286532] c.g.d.api.async.ResultCallbackTemplate : Error during callback
java.io.UncheckedIOException: Error while executing Request{method=POST, path=/images/create?fromImage=hashicorp%2Fterraform%3Alatest, body=null, bodyBytes=null, hijackedInput=null, headers={accept=application/octet-stream, content-type=application/json}}
at com.github.dockerjava.okhttp.OkDockerHttpClient.execute(OkDockerHttpClient.java:233) ~[docker-java-transport-okhttp-3.2.7.jar!/:na]
at com.github.dockerjava.core.DefaultInvocationBuilder.execute(DefaultInvocationBuilder.java:228) ~[docker-java-core-3.2.7.jar!/:na]
at com.github.dockerjava.core.DefaultInvocationBuilder.lambda$executeAndStream$1(DefaultInvocationBuilder.java:269) ~[docker-java-core-3.2.7.jar!/:na]
at java.base/java.lang.Thread.run(Thread.java:832) ~[na:na]
Caused by: java.io.IOException: native connect() failed : No such file or directory
at com.github.dockerjava.okhttp.UnixDomainSocket.connect(UnixDomainSocket.java:157) ~[docker-java-transport-okhttp-3.2.7.jar!/:na]
at com.github.dockerjava.okhttp.UnixSocketFactory$1.connect(UnixSocketFactory.java:29) ~[docker-java-transport-okhttp-3.2.7.jar!/:na]
at okhttp3.internal.platform.Platform.connectSocket(Platform.java:130) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:263) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:183) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:229) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.RealCall.execute(RealCall.java:81) ~[okhttp-3.14.9.jar!/:na]
at com.github.dockerjava.okhttp.OkDockerHttpClient$OkResponse.(OkDockerHttpClient.java:256) ~[docker-java-transport-okhttp-3.2.7.jar!/:na]
at com.github.dockerjava.okhttp.OkDockerHttpClient.execute(OkDockerHttpClient.java:230) ~[docker-java-transport-okhttp-3.2.7.jar!/:na]
... 3 common frames omitted
2022-01-06 17:29:10.556 ERROR 1 --- [ gaia-runner-1] .a.i.SimpleAsyncUncaughtExceptionHandler : Unexpected exception occurred invoking async method: public void io.gaia_app.runner.StepRunner.runStep(io.gaia_app.runner.RunnerStep)
java.io.UncheckedIOException: Error while executing Request{method=POST, path=/images/create?fromImage=hashicorp%2Fterraform%3Alatest, body=null, bodyBytes=null, hijackedInput=null, headers={accept=application/octet-stream, content-type=application/json}}
at com.github.dockerjava.okhttp.OkDockerHttpClient.execute(OkDockerHttpClient.java:233) ~[docker-java-transport-okhttp-3.2.7.jar!/:na]
at com.github.dockerjava.core.DefaultInvocationBuilder.execute(DefaultInvocationBuilder.java:228) ~[docker-java-core-3.2.7.jar!/:na]
at com.github.dockerjava.core.DefaultInvocationBuilder.lambda$executeAndStream$1(DefaultInvocationBuilder.java:269) ~[docker-java-core-3.2.7.jar!/:na]
at java.base/java.lang.Thread.run(Thread.java:832) ~[na:na]
Caused by: java.io.IOException: native connect() failed : No such file or directory
at com.github.dockerjava.okhttp.UnixDomainSocket.connect(UnixDomainSocket.java:157) ~[docker-java-transport-okhttp-3.2.7.jar!/:na]
at com.github.dockerjava.okhttp.UnixSocketFactory$1.connect(UnixSocketFactory.java:29) ~[docker-java-transport-okhttp-3.2.7.jar!/:na]
at okhttp3.internal.platform.Platform.connectSocket(Platform.java:130) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:263) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:183) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.ExchangeFinder.findConnection(ExchangeFinder.java:224) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.ExchangeFinder.findHealthyConnection(ExchangeFinder.java:108) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.ExchangeFinder.find(ExchangeFinder.java:88) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.Transmitter.newExchange(Transmitter.java:169) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:41) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:94) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:88) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:142) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:117) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:229) ~[okhttp-3.14.9.jar!/:na]
at okhttp3.RealCall.execute(RealCall.java:81) ~[okhttp-3.14.9.jar!/:na]
at com.github.dockerjava.okhttp.OkDockerHttpClient$OkResponse.(OkDockerHttpClient.java:256) ~[docker-java-transport-okhttp-3.2.7.jar!/:na]
at com.github.dockerjava.okhttp.OkDockerHttpClient.execute(OkDockerHttpClient.java:230) ~[docker-java-transport-okhttp-3.2.7.jar!/:na]
... 3 common frames omitted
2022-01-06 17:29:14.079 INFO 1 --- [ scheduling-1] io.gaia_app.runner.StepPoller : Polling for pending steps
Hey @henrikmotzkus 👋
Thank you for opening this detailed issue.
As for now, the Gaia runner cannot run on Kubernetes, as it needs direct access to a docker daemon, hence the error.
We're working on native kubernetes support these days, I hope we can release a version of the runner that supports kubernetes in the next few weeks.
I'll notify you in this issue when the feature will be available in the runner.
FYI: @juwit @henrikmotzkus , might have made it work with this manifest, exposing the docker socket to the container on kubernetes (will still do changes to the manifest, got it to connect to the socket at least):
apiVersion: apps/v1
kind: Deployment
metadata:
name: gaia-runner
labels:
app: gaia-runner
spec:
replicas: 1
selector:
matchLabels:
app: gaia-runner
template:
metadata:
labels:
app: gaia-runner
spec:
containers:
- name: gaia-runner
image: gaiaapp/runner:v2.2.0
ports:
- containerPort: 8080
env:
- name: GAIA_URL
value: "http://gaia:8080"
- name: GAIA_RUNNER_API_PASSWORD
value: "123456"
volumeMounts:
- name: dockersock
mountPath: "/var/run/docker.sock"
volumes:
- name: dockersock
hostPath:
path: /var/run/docker.sock
Hello @amitai-devops
Just out of interest, which Kubernetes API version are you using? After 1.20 the usage of the docker runtime have been removed
Regards
@candidson True, I am using 1.18 on EKS. Regardless I could have used Docker-In-Docker to forcefully run inside a docker runtime. I do not know if the Kubernetes change will affect this, but good pointing it out
Hello @amitai-devops
From my experience, d-i-d wouldn't work directly since you do not have access to any docker socket anymore.. perhaps having docker running in oci based image, and in which gaia-runner would be hosted, might work. However the current gaia-runner code expects a docker socket as well. Then again, this wouldn't work.
I was working on rewriting the gaia-runner code to use the podman api for example, however I understood from @juwit that he is working on an even better solution, leveraging the Kubernetes native APIs directly: gaia-app/runner#56
@candidson I am waiting for a solution as I also want to solve this. I've actually gone pretty far as to make the runner spin up a container, but i'm having trouble with the java-docker as it doesn't resolve addresses, as if it doesn't have any access to a DNS server, even after switching multiple terraform docker images in the module:
[gaia] using image kjmkznr/terraform:latest
[gaia] installing curl
ERROR: https://dl-cdn.alpinelinux.org/alpine/v3.15/main: temporary error (try again later)
ERROR: https://dl-cdn.alpinelinux.org/alpine/v3.15/community: temporary error (try again later)
[gaia] cloning https://github.com/terraform-aws-modules/terraform-aws-ec2-instance
Cloning into 'module'...
fatal: unable to access 'https://github.com/terraform-aws-modules/terraform-aws-ec2-instance/': Could not resolve host: github.com
@candidson I am waiting for a solution as I also want to solve this. I've actually gone pretty far as to make the runner spin up a container, but i'm having trouble with the java-docker as it doesn't resolve addresses, as if it doesn't have any access to a DNS server, even after switching multiple terraform docker images in the module:
[gaia] using image kjmkznr/terraform:latest [gaia] installing curl ERROR: https://dl-cdn.alpinelinux.org/alpine/v3.15/main: temporary error (try again later) ERROR: https://dl-cdn.alpinelinux.org/alpine/v3.15/community: temporary error (try again later) [gaia] cloning https://github.com/terraform-aws-modules/terraform-aws-ec2-instance Cloning into 'module'... fatal: unable to access 'https://github.com/terraform-aws-modules/terraform-aws-ec2-instance/': Could not resolve host: github.com
Hey 👋
Yes, it seems that the containers that the runner spins-up can't access the internet.
It may be related to a network limitation on your cluster, or on the docker host.
I you use the underlying docker daemon of a kubernetes cluster, the daemon will probably not be configured to have a bridge network to the host, so it may explain the issue. In that case, you may have to create this network.
Can you try to run the following commands on your docker host:
Test docker networks:
docker network ls
expected output (bridge network is important)
NETWORK ID NAME DRIVER SCOPE
0f096fefbc9f bridge bridge local
d20d9f4727bf host host local
3a91ee5ac87c none null local
If the bridge network doesn't exists, you may try:
docker network create -d bridge bridge-network
Test DNS configuration
docker run --rm -it alpine cat /etc/resolv.conf
expected output (with IP depending on your DHCP configuration)
nameserver 192.168.1.1
Hope it helps diagnose the issue
@juwit You were right, Using a similar solution to yours I was able to make the Gaia runner work on Kubernetes.
FYI: @henrikmotzkus @candidson
Steps:
- On your kubernetes node, run the following commands to create a bridge network:
cp /etc/docker/daemon.json /etc/docker/daemon_backup.json
echo -e '.bridge="docker0" | ."live-restore"=false' > /etc/docker/jq_script
jq -f /etc/docker/jq_script /etc/docker/daemon_backup.json | tee /etc/docker/daemon.json
systemctl restart docker
- Deploy the runner manifest, and connect the pod to the "host network" of the kubernetes node:
- The gaia URL has to be reached from the kubernetes node
apiVersion: apps/v1
kind: Deployment
metadata:
name: gaia-runner
labels:
app: gaia-runner
spec:
replicas: 1
selector:
matchLabels:
app: gaia-runner
template:
metadata:
labels:
app: gaia-runner
annotations:
sidecar.istio.io/inject: "false" # remove all sorts of service mesh configurations that could interfere
spec:
hostNetwork: true # note this line
containers:
- name: gaia-runner
image: gaiaapp/runner:v2.2.0
ports:
- containerPort: 8080
env:
- name: GAIA_URL
value: "https://gaia.your.url"
- name: GAIA_RUNNER_API_PASSWORD
value: "123456"
volumeMounts:
- name: dockersock
mountPath: "/var/run/docker.sock"
volumes:
- name: dockersock
hostPath:
path: /var/run/docker.sock
See Gaia Runner output when running a stack:
[gaia] using image hashicorp/terraform:latest
[gaia] installing curl
[gaia] cloning https://github.com/terraform-aws-modules/terraform-aws-ec2-instance
Cloning into 'module'...
[gaia] generating backend configuration
[gaia] generating tfvars variable file
[gaia] running terraform init
Terraform v1.1.3
on linux_amd64
Initializing the backend...
Successfully configured the backend "http"! Terraform will automatically
use this backend unless the backend configuration changes.
Initializing provider plugins...
- Finding hashicorp/aws versions matching ">= 3.72.0"...
- Installing hashicorp/aws v3.72.0...
- Installed hashicorp/aws v3.72.0 (signed by HashiCorp)
Great !
This kind of workaround will not be necessary with the kubernetes runner planned for the next runner release.
I'll close this issue.
Hey there 👋
We've implemented the the Kubernetes executor in the latest Runner version (2.3.0). I think this will help you, as the Runner does not need a docker engine anymore, but can directly interact with the Kubernetes API.
Here are some links:
- documentation: https://docs.gaia-app.io/configuration/runner-configuration/#kubernetes-runner
- sample helm chart to help you get started: https://github.com/gaia-app/chart