RDS server running in Kubernetes fails to start
networkop opened this issue · 2 comments
Describe the bug
When running cloudprober RDS inside Kubernetes, it may fail to start due with the following error:
Error initializing cloudprober. Err: error while creating listener for default gRPC server: listen tcp :9314: bind: address already in use
Cloudprober Version
v0.10.7
To Reproduce
Install Cloudprober with the following manifest
apiVersion: v1
data:
cloudprober.cfg: |-
grpc_port: 9314
rds_server {
provider {
kubernetes_config {
namespace: "metrics"
services {}
clusters {}
}
}
}
kind: ConfigMap
metadata:
name: cloudprober-config
namespace: metrics
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: cloudprober
namespace: metrics
spec:
replicas: 1
selector:
matchLabels:
app: cloudprober
template:
metadata:
labels:
app: cloudprober
spec:
volumes:
- name: cloudprober-config
configMap:
name: cloudprober-config
containers:
- name: cloudprober
image: cloudprober/cloudprober
command: ["/cloudprober"]
args: [
"--config_file","/cfg/cloudprober.cfg",
"--logtostderr"
]
volumeMounts:
- name: cloudprober-config
mountPath: /cfg
ports:
- name: http
containerPort: 9313
- name: grpc
containerPort: 9314
---
apiVersion: v1
kind: Service
metadata:
name: cloudprober
namespace: metrics
labels:
app: cloudprober
spec:
ports:
- port: 9314
protocol: TCP
targetPort: grpc
selector:
app: cloudprober
type: ClusterIP
Additional context
The problem seems to be cause by the name of the service. It seems like cloudprober will try to automatically bind the HTTP server to the address found in the CLOUDPROBER_PORT
https://github.com/google/cloudprober/blob/master/cloudprober.go#L108
When running in RDS mode, I'm not really interested in the HTTP port and want to expose GRPC. However, if I name a service as cloudprober
this will inject the following env variable into the pod CLOUDPROBER_PORT=tcp://10.0.144.117:9314
which will get used by the HTTP server. So by the time the code gets to the GRPC part, the port is already in use and the binary crashes with the above error.
The workaround is simple -- rename the service to something like cloudprober-grpc
so that it doesn't get mistaken for the HTTP server. However it seems like it may be solved by refactoring the code, e.g. making sure that if grpc_port
is defined, then it shouldn't try to bind anything else to it.
Ah, of course. I think we should have just not used CLOUDPROBER_PORT environment variable if it was in the Kubernetes <SERVICE>_PORT
format: tcp://..:<port>
:
Line 87 in f5fe722
This was done to fix the issue #252, but just ignoring CLOUDPROBER_PORT if not an integer would have been good enough as well.
Thanks @networkop for spotting and debugging this.
This should now be fixed in HEAD (after #477).
I am going to include it in v0.11.0 (due tomorrow).
Thank you once again @networkop for reporting this, and with excellent details.