[💡 FEATURE REQUEST]: Simplifying the health check of the entire Application Server

Question

[💡 FEATURE REQUEST]: Simplifying the health check of the entire Application Server

Closed this issue 2 months ago · 5 comments

Kaspiman commented 3 months ago

Plugin

Status

I have an idea!

I have an idea! I propose to reconsider the mechanics of determining server health.

Problem:

According to the documentation, you need to list the plugins to check.

When enabling and disabling the plugin on a project, you must not forget about the health-check address. This makes it necessary to synchronize the list of plugins manually. For example: http://127.0.0.1:2114/health?plugin=http&plugin=grpc.

The problem becomes more widespread when developing a large number of services. For example, I want to do a standard deployment for Kubernetes with standart healthchecks for all services in company.

k8s deployment fragment:

...
readinessProbe:
      httpGet:
        path: /health?plugin=http&plugin=grpc
        port: 2114
      initialDelaySeconds: 5
      periodSeconds: 10
    livenessProbe:
      httpGet:
        path: /health?plugin=http&plugin=grpc
        port: 2114
      initialDelaySeconds: 15
      periodSeconds: 20
...

Different services may have different sets of plugins and different addresses for health checks. It is possible that some of the services will not have an HTTP plugin at all.

The need to explicitly specify the list of plugins in the parameters makes it impossible to standardize such checks. You will have to manually control this list, which will inevitably lead to "forgotten" plugins and unreliable health information.

Proposal:

Define the concept of "application server is healthy". Make a single source for determining the health of the entire application server.

For example: request /health without parameters will be successful if all enabled plugins are healthy. The list of enabled plugins can be easily calculated in a future version of RR (see [🧹 CHORE]: RoadRunner v2025 thoughts "Add enabled=true/false to the plugins' configuration").

Now that request returns HTTP 400 Bad Request error: "No plugins provided in query. Query should be in form of: health?plugin=plugin1&plugin=plugin2". Backward compatibility will not be broken since it is not possible to use this method without parameters.

Single source of healthcheck will:

provide a single and reliable source of information
simplify and standardize deployments

...
readinessProbe:
      httpGet:
        path: /health
        port: 2114
      initialDelaySeconds: 5
      periodSeconds: 10
    livenessProbe:
      httpGet:
        path: /health
        port: 2114
      initialDelaySeconds: 15
      periodSeconds: 20
...

allow the use of a new k8s gRPC liveness probe.

Answer 1 · 2024-09-03T09:40:07.000Z

Yeah, good suggestion!

Answer 2 · 2024-09-03T10:07:43.000Z

it will be great to have that option

Answer 3 · 2024-09-03T11:46:41.000Z

This improvement will save me from headaches)

Answer 4 · 2024-09-05T17:01:57.000Z

Hey @Kaspiman 👋
I understand what's happening here, however, /health endpoint is a common practice.
RR internal mechanism to deduce active plugins work w/o enable=true/false configuration option; thus it'd be easy to get all plugins implementing health checks.

I'll also update the output of the endpoint in case of failure (non-healthy plugin(s)) to be in JSON form to be easily parsed, instead of just text representation. Smt like this:

{
    [
        "<plugin_name>": {
        "status": 200,
        "error (in case of non-200 status)" : "error message" 
        }
    ]
}

It would be implemented in the v2024.3.0. Thanks, everyone who voted 👍

Answer 5 · 2024-09-05T17:41:25.000Z

Thanks, nice to hear!