wtsi-hgi/hgi-systems

Check for problems with the cloud

Opened this issue · 0 comments

When the cloud isn't completely working, it causes some unusual issues and error messages in Ansible. It's easy to waste a load of time down rabit holes without realising that the cloud isn't quite as fluffy as it should be...

It could be useful if we had something to discover cloud issues. This could be:

  • A status service that runs continuously, which Ansible could check before running and fail with a meaningful error message if a part of the cloud that it will need is down (and to ignore the issue if the run won't use that part of the cloud).
  • A script Ansible runs to ensure the cloud is working.
  • A script that could be ran manually (if the developer remembers) as part of the debugging process.

The first option would be preferable IMO. However, I suspect that it would take quite a lot of work to correctly integrate into our Ansible runs. Whether we should maintain such a service or expect others to provide it is debatable.