contiv-experimental/cluster

Proposal: add REST endpoints for monitoring subsystem

mapuri opened this issue · 2 comments

Addresses #97 but in general makes node discovery events programmatic and pluggable.

Problem description:

  • serf monitoring subsystem delivers the events from serf-agent to clusterm using a basic callback mechanism
  • clusterm registers the callback as one of it's member functions which directly injects the node discovered/disappeared events.
  • While simple this implementation has a few limitations:
    • it always delivers the event to the local clusterm, which tries to handle it locally. This adds complexity when we need to add multi-instance support for clusterm, viz. conflict resolution in case of contradicting status updates for a node.
    • it creates an unnecessary coupling between clusterm and serf processes. In other words, this requires serf to be run on the host where clusterm is running. This breaks the environments where user may want to run clusterm outside the cluster or on a node that won't be made part of cluster.
      • Note the proposal in this PR enables doing this. But implementing this will also require a separate script on node running serf to POST events to clusterm.

Overview of changes:

  • define REST endpoints and client methods to POST node discovered/disappeared events to clusterm
  • remove the callback mechanism in monitor subsystem
  • POST to REST endpoint in serf's monitory subsystem event handler
  • inject discovered/disappeared events on receiving the REST request.
  • the address/url to POST shall be specified in the clusterm configuration. It shall default to localhost:<cluster API port>

UX consideration:

REST Endpoint:

  • POST: monitor/discovered/{node}
    Body:
    {
    map[string]string // Node-Label, Node-Serial, Node-Mgmt-Address are the three key expected to be in the map
    }
    Response:
    • 200: on successful event injection
    • 500: on error
  • POST: monitor/disappeared/{node}
    Body:
    {
    map[string]string // Node-Label, Node-Serial, Node-Mgmt-Address are the three key expected to be in the map
    }
    Response:
    • 200: on successful event injection
    • 500: on error

CLI:

  • None required for initial implementation

Web:

  • None required for initial implementation

System Testing

  • existing mode of operation shall see no impact because of this change

it might be desirable to post events for multiple nodes so I am thinking to add following REST endpoint instead:

POST: monitor/event/{ discovered | disappeared }
Body:
{
[ ] map[string]string // Node-Label, Node-Serial, Node-Mgmt-Address are the three key expected to be in the map per node
}
Response:
200: on successful event injection
500: on error

fixed by #124