galaxyproject/cloudman

Race condition for bulk project creation

Opened this issue · 4 comments

Following 75cfd31 there is a race condition when using projman_load_config.py to create projects in bulk.
The OIDC client secret injection in context, for all apps except the data browser, relies on the secret deployed by the projman chart. While this is fine when creating a project manually and adding subsequent apps, because projman gets deployed by the API when creating the project, when installing apps in bulk right after creation (as is the case when using projman_load_config.py, it happens that some apps get deployed with context.project.oidc_client_secret still being None which subsequently breaks logins.
Solution is likely to just special case projman to allow that context value to be empty for that deployment, but make the API retry a few times for other apps rather than accepting the empty context value.

@almahmoud I just forked the repository. How do you advise I begin?

If you look at the old commit: 75cfd31 could give you an idea of what was added before (essentially a name is created based on a template from the project name, and it fetches the secret with the KubeClient).
In terms of what to do for this specific issue, it's a matter of adding a retry on the try part of the the try/except here:

. Perhaps retrying 3 times, at 5 second intervals, before returning None, for a first contribution.
You can use the tenacity library, (here is a quick blog about tenacity: https://julien.danjou.info/python-tenacity/ )

Down the line we can make it a bit smarter, like checking for the chart that is being installed, but this would be a good start!

Alrighty... Let me get that working locally

Hi @almahmoud, I was able to come up with

retryer = tenacity.Retrying(
            stop=tenacity.stop_after_attempt(3),
            retry=tenacity.retry_if_exception_type(),
            wait=tenacity.wait_fixed(5))

retryer.call()

Question I have is do I need the try/except block after using tenacity?