roadrunner-server/roadrunner

[๐Ÿ’ก FEATURE REQUEST]: Simplifying the health check of the entire Application Server

Closed this issue ยท 5 comments

Plugin

Status

I have an idea!

I have an idea! I propose to reconsider the mechanics of determining server health.

Problem:

According to the documentation, you need to list the plugins to check.

When enabling and disabling the plugin on a project, you must not forget about the health-check address. This makes it necessary to synchronize the list of plugins manually. For example: http://127.0.0.1:2114/health?plugin=http&plugin=grpc.

The problem becomes more widespread when developing a large number of services. For example, I want to do a standard deployment for Kubernetes with standart healthchecks for all services in company.

k8s deployment fragment:

...
readinessProbe:
      httpGet:
        path: /health?plugin=http&plugin=grpc
        port: 2114
      initialDelaySeconds: 5
      periodSeconds: 10
    livenessProbe:
      httpGet:
        path: /health?plugin=http&plugin=grpc
        port: 2114
      initialDelaySeconds: 15
      periodSeconds: 20
...

Different services may have different sets of plugins and different addresses for health checks. It is possible that some of the services will not have an HTTP plugin at all.

The need to explicitly specify the list of plugins in the parameters makes it impossible to standardize such checks. You will have to manually control this list, which will inevitably lead to "forgotten" plugins and unreliable health information.

Proposal:

Define the concept of "application server is healthy". Make a single source for determining the health of the entire application server.

For example: request /health without parameters will be successful if all enabled plugins are healthy. The list of enabled plugins can be easily calculated in a future version of RR (see [๐Ÿงน CHORE]: RoadRunner v2025 thoughts "Add enabled=true/false to the plugins' configuration").

Now that request returns HTTP 400 Bad Request error: "No plugins provided in query. Query should be in form of: health?plugin=plugin1&plugin=plugin2". Backward compatibility will not be broken since it is not possible to use this method without parameters.

Single source of healthcheck will:

  • provide a single and reliable source of information
  • simplify and standardize deployments
...
readinessProbe:
      httpGet:
        path: /health
        port: 2114
      initialDelaySeconds: 5
      periodSeconds: 10
    livenessProbe:
      httpGet:
        path: /health
        port: 2114
      initialDelaySeconds: 15
      periodSeconds: 20
...

Yeah, good suggestion!

it will be great to have that option

This improvement will save me from headaches)

Hey @Kaspiman ๐Ÿ‘‹
I understand what's happening here, however, /health endpoint is a common practice.
RR internal mechanism to deduce active plugins work w/o enable=true/false configuration option; thus it'd be easy to get all plugins implementing health checks.

I'll also update the output of the endpoint in case of failure (non-healthy plugin(s)) to be in JSON form to be easily parsed, instead of just text representation. Smt like this:

{
    [
        "<plugin_name>": {
        "status": 200,
        "error (in case of non-200 status)" : "error message" 
        }
    ]
}

It would be implemented in the v2024.3.0. Thanks, everyone who voted ๐Ÿ‘

Thanks, nice to hear!