valeriano-manassero/helm-charts

/v1/node API is not accessible when the password file authentication is enabled

Closed this issue ยท 10 comments

Due to this issue, health check of the worker nodes are failing:
https://github.com/valeriano-manassero/helm-charts/blob/main/valeriano-manassero/trino/templates/configmap-worker.yaml#L49

I see two options:

  1. We change the health check
  2. We find a way to authenticate to the API . However, I couldn't find a way to authenticate for the API till now.

Any ideas?

I didn't noticed this issue since my setting is using an external loadbalancer so I have X-Forwarded-Proto: https but http plain requests are still working from inside pod with health check because basically Trino itself is still using http and skips auth.

In your case, are you Using TLS enabled directly on Trino installation?

nvm, I reproduced the issue that is specific on workers; the easy solution is to change the health check to just call localhost /v1/status but this will not be useful if there are issues contacting coordinator. Need to think about a solution.

Right. I have also used /v1/status temporarily. Btw, I am also using an external load-balancer.

My proposal is to use /v1/info/node that is also used by workers to complete discovery process. If this one is failing we know it's not possible for the worker to join the cluster so we can fail the check.

@fxulusoy wdyt?

Hey @valeriano-manassero, thanks for your effort on this helm chart. I was actually starting to deploy this and I noticed this issue and I believe your solution using the endpoint /v1/info/node makes sense since i think the health check here is basically reflecting successful discovery

Chart version 1.1.5 just released (it will be listed in artifacthub in ~30 mins), pls can you check if issue is now fixed?

@valeriano-manassero Does the /v1/info/node exist?

I have tried your health check command in the worker pod curl -H 'X-trino-User: healthCheck' -m 3 --silent trino:8080/v1/info/node and I got HTTP 404.

Health check should not fail now but this is not really checking anything, right? I did not have time to investigate this yet.

@valeriano-manassero Does the /v1/info/node exist?

I have tried your health check command in the worker pod curl -H 'X-trino-User: healthCheck' -m 3 --silent trino:8080/v1/info/node and I got HTTP 404.

Health check should not fail now but this is not really checking anything, right? I did not have time to investigate this yet.

I just noticed I did a mistake on the url, the right one is /v1/info/state, will release a new chart in minutes, sorry for the mistake.

Version 1.1.7 released with one more improvement so now health check is also testing the output for ACTIVE string so we should be able to detect issues if coordinator is responding with something unexpected.

Let me know if this last version improvement is good.

P.S. please be sure to use chart version 1.1.7 (I released a couple of new versions in the middle). This one should contain a good health check that is a real one ๐Ÿ˜„

Since thumbs up are there I'm going to close the issue ๐Ÿ˜„ . Feel free to open another one if some bugs arises.