support for getting node name from env var instead of `hostname`

Question

support for getting node name from env var instead of `hostname`

Closed this issue 2 years ago · 2 comments

Hi @PalNilsson
Running jobs on kubernetes, we face the issue that processes running in pods see the pod name as the host name of the node. For example, in a pod named grid-job-16703192-mp47k (which is effectively the batch job ID):

bash-4.2$ hostname
grid-job-16703192-mp47k

This makes sense for k8s, and is appropriate because each pod does have its own unique IP address, but it doesn't fit well with Panda; the result is: https://bigpanda.cern.ch/wns/CA-VICTORIA-K8S-T2/?hours=12
Every "node" is a random unique ID so it is very difficult to correlate jobs to real nodes and identify problematic nodes.

We can easily expose the real node name as an env var:

bash-4.2$ echo $MY_NODE_NAME
cluster-dev-k8s-node-a-2

So I would like to propose that the pilot look for some specific env var (maybe PANDA_NODE_NAME, PILOT_NODE_NAME ?), and if it exists, it uses that instead of the result of hostname when reporting details to the Panda server. Would that be reasonable?

Answer 1 · 2022-08-19T12:53:50.000Z

Hi (back from vacation). Using PILOT_NODE_NAME sounds good. I can look for it and use it instead of hostname if set.

Answer 2 · 2022-08-23T14:10:55.000Z

Moved the discussion to https://its.cern.ch/jira/browse/ATLASPANDA-641 (I'm suggesting to use existing env var PANDA_HOSTNAME instead of a new one).

Implemented in dev pilot.