support for getting node name from env var instead of `hostname`
Closed this issue · 2 comments
Hi @PalNilsson
Running jobs on kubernetes, we face the issue that processes running in pods see the pod name as the host name of the node. For example, in a pod named grid-job-16703192-mp47k (which is effectively the batch job ID):
bash-4.2$ hostname
grid-job-16703192-mp47k
This makes sense for k8s, and is appropriate because each pod does have its own unique IP address, but it doesn't fit well with Panda; the result is: https://bigpanda.cern.ch/wns/CA-VICTORIA-K8S-T2/?hours=12
Every "node" is a random unique ID so it is very difficult to correlate jobs to real nodes and identify problematic nodes.
We can easily expose the real node name as an env var:
bash-4.2$ echo $MY_NODE_NAME
cluster-dev-k8s-node-a-2
So I would like to propose that the pilot look for some specific env var (maybe PANDA_NODE_NAME
, PILOT_NODE_NAME
?), and if it exists, it uses that instead of the result of hostname
when reporting details to the Panda server. Would that be reasonable?
Hi (back from vacation). Using PILOT_NODE_NAME sounds good. I can look for it and use it instead of hostname if set.
Moved the discussion to https://its.cern.ch/jira/browse/ATLASPANDA-641 (I'm suggesting to use existing env var PANDA_HOSTNAME instead of a new one).
Implemented in dev pilot.