google/cloudprober

RDS server running in Kubernetes fails to start

networkop opened this issue · 2 comments

Describe the bug
When running cloudprober RDS inside Kubernetes, it may fail to start due with the following error:

Error initializing cloudprober. Err: error while creating listener for default gRPC server: listen tcp :9314: bind: address already in use                                

Cloudprober Version
v0.10.7

To Reproduce

Install Cloudprober with the following manifest

apiVersion: v1
data:
  cloudprober.cfg: |-
    grpc_port: 9314
    rds_server {
      provider {
        kubernetes_config {
          namespace: "metrics"
          services {}
          clusters {}
        }
      }
    }
kind: ConfigMap
metadata:
  name: cloudprober-config
  namespace: metrics
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cloudprober
  namespace: metrics
spec:
  replicas: 1
  selector:
    matchLabels:
      app: cloudprober
  template:
    metadata:
      labels:
        app: cloudprober
    spec:
      volumes:
      - name: cloudprober-config
        configMap:
          name: cloudprober-config
      containers:
      - name: cloudprober
        image: cloudprober/cloudprober
        command: ["/cloudprober"]
        args: [
          "--config_file","/cfg/cloudprober.cfg",
          "--logtostderr"
        ]
        volumeMounts:
        - name: cloudprober-config
          mountPath: /cfg
        ports:
        - name: http
          containerPort: 9313
        - name: grpc
          containerPort: 9314
---
apiVersion: v1
kind: Service
metadata:
  name: cloudprober
  namespace: metrics
  labels:
    app: cloudprober
spec:
  ports:
  - port: 9314
    protocol: TCP
    targetPort: grpc
  selector:
    app: cloudprober
  type: ClusterIP

Additional context
The problem seems to be cause by the name of the service. It seems like cloudprober will try to automatically bind the HTTP server to the address found in the CLOUDPROBER_PORT
https://github.com/google/cloudprober/blob/master/cloudprober.go#L108

When running in RDS mode, I'm not really interested in the HTTP port and want to expose GRPC. However, if I name a service as cloudprober this will inject the following env variable into the pod CLOUDPROBER_PORT=tcp://10.0.144.117:9314 which will get used by the HTTP server. So by the time the code gets to the GRPC part, the port is already in use and the binary crashes with the above error.

The workaround is simple -- rename the service to something like cloudprober-grpc so that it doesn't get mistaken for the HTTP server. However it seems like it may be solved by refactoring the code, e.g. making sure that if grpc_port is defined, then it shouldn't try to bind anything else to it.

Ah, of course. I think we should have just not used CLOUDPROBER_PORT environment variable if it was in the Kubernetes <SERVICE>_PORT format: tcp://..:<port>:

if strings.HasPrefix(portStr, "tcp://") {

This was done to fix the issue #252, but just ignoring CLOUDPROBER_PORT if not an integer would have been good enough as well.

Thanks @networkop for spotting and debugging this.

This should now be fixed in HEAD (after #477).

I am going to include it in v0.11.0 (due tomorrow).

Thank you once again @networkop for reporting this, and with excellent details.