indigo-dc/orchestrator

Cluster deployed with already expired token

mtangaro opened this issue · 3 comments

Dear experts,
on deploying an elastic cluster (SLURM is used as resource manager and Galaxy as workflow manager), it happens that CLUES can't contact the orchestrator for nodes deployment, because the token (injected by the orchestrator) is already expired.

Indeed, from the information in the token you can see:

"iss": "https://iam.recas.ba.infn.it/",
"exp": 1541413974,
"iat": 1541410374,

corresponding to:

iat: Monday 5 November 2018 10:32:54
exp: Monday 5 November 2018 11:32:54

The CLUES log:
[PLUGIN-INDIGO-ORCHESTRATOR];ERROR;2018-11-05 11:40:23,067;ERROR getting deployment info: {"code":401,"title":"Unauthorized","message":"Invalid token: ***TOKEN***"} [PLUGIN-INDIGO-ORCHESTRATOR];WARNING;2018-11-05 11:40:23,067;No resources obtained from orchestrator. [PLUGIN-INDIGO-ORCHESTRATOR];DEBUG;2018-11-05 11:40:30,674;The access token is valid for -4056 seconds. [PLUGIN-INDIGO-ORCHESTRATOR];ERROR;2018-11-05 11:40:30,674;Error refreshing access token: No client info provided.

This is not rare and non-INDIGO-experts can't recover the cluster.

Hi Marco,
The token is not expired at Infrastructure creation time (otherwise the creation request to IM would fail).
It probably reach expiration time because the ansible roles execute things, before using the token, that require a lot of time to be completed
Could you please give me an estimation of how much time the ansible roles take to reach the point where the injected access token is used?

Hi Alberto,
Without any non elastic node deployed, clues is installed as "first" step, taking 20 minutes to start after the the deployment submission.
I'm using this template: https://github.com/indigo-dc/tosca-types/blob/master/examples/galaxy_elastic_cluster_full_elixirIT.yaml
Actually using a non elastic node this may take much more time, even hours, since I have to wait galaxy to be installed and the nfs has to be configured between master and worker nodes.
I'm including @micafer in the loop, maybe he can help.

Hi @mtangaro,
The problem is that the orchestrator plugin is the one that gets the refresh token using the injected access token. If clues is started with the access token expired it will fail.
We need a way to get the refresh token at the beginning of the configuration of the front-end node.