contiv-experimental/cluster

Proposal: support runtime configuration changes for clustermgr

mapuri opened this issue · 1 comments

This proposal describes a design approach to enable runtime change of clusterm's configuration and provide initial support through a command line and os signal handling.

Problem Description

clusterm's configuration could change in a few scenarios:

  • to change the location of ansible playbook
  • to update the credentials of ansible user
  • others advanced (and future) scenarios include changing api ip/port; moving from serf to a different monitoring service; propagating configuration change to multiple clusterm instances; clusterm supports more richer configuration around host-group allocation policies; discovery-actions and so on.

At present when clusterm's configuration is changed, we require a restart of clusterm service. This has a few limitations, viz.

  • clusterm does not support a complete stateful restart, so it ends up loosing some state after a restart. This will be eventually addressed in future but the limitations below will still apply.
  • a service restart is a lot more disruptive than we may desire i.e. clusterm could be in between a event processing and forceful restart can leave some side-effect.
  • the service restart requirement makes it hard to use dynamic and versioned configuration management like consul+git2consul for clusterm in future (with possibly extra config watch code but not in the scope of this PR)

Requirement Scope and Design overview

Following are the requirements and high level changes for the initial PR that I think suit for initial scope of this functionality:

  • add an API endpoint to POST configuration through the built-in event framework.
    • the REST call shall trigger an event to apply the posted configuration
    • the configuration shall be merged with defaults to allow partial updated
    • Note: we may want to merge the posted config with current configuration (instead of default). I can't think of a use case, rather it seems a bit confusing and difficult to explain. We can address it in subsequent PRs, if a use case arises.
  • add a signal handler that triggers the REST endpoint and tries to re-parse and POST the configuration file with which clusterm was started. This is going to be 90% use case.
  • add a command line that triggers the REST endpoint and accepts new configuration from stdin or a file. This is more for testing and completeness. This is going to be a 10% use case.
  • only handle changes to ansible related configuration.
    • rest of configuration changes (like ip/port for REST requests; serf config etc) shall result in an error and added later as we address more use-cases that open up with this feature.
  • add an API endpoint to GET the current clusterm configuration
    • this will require clusterm to remember (in-memory ) it's configuration.

UX

  • API endpoint
    • GET: /config
      Response:

      • Code
        200 : always
      • Body:
        {
        "config" : {< json-config >}
        }
    • POST: /config
      Request

      • Body:
        {
        "config" : {< json-config >}
        }

      Response:

      • 200 : on successfully triggering of the configuration update
      • 500 : on error like trying to update non-supported configuration (like cluster API port)
  • CLI
    • clusterctl config get : returns the current clusterm config
    • clusterctl config update < - > : read json config from stdin and POSTs to clusterm
    • clusterctl config update --file < filename > : : read json config from specified file and POSTs to clusterm
  • SIGHUP Signal handler
    • add a SIGHUP handler that reads and POSTs the json configuration file (from the file clusterm was started with)
    • an configuration error returned by clusterm is silently ignored in this case (user might need to check clusterm logs)

/cc @vvb

fixed by combination of #175 #176 and #177