RFC: Tribe - subtribes, policy, message encryption and calling remote plugins
jcooklin opened this issue · 4 comments
This spec proposes the following features and enhancements to tribe.
- Group members of the global tribe into named subtribes replacing
what today is called agreements - Tribe policies to dynamically add nodes to subtribes
- Enable calling remote plugins through sharing plugin meta via
tribe - Encryption of tribe messages
Subtribes
To improve the operational experience what is currently known as an ‘agreement’
will be replaced with the term ‘tribe’ and actions that affect a tribe will be
made explicit. Let’s start with an example.
Instead of creating a named agreement we will create a named tribe and
and join all the members to it.
i1> snapctl tribe create core
i1> snapctl tribe join core i1
i1> snapctl tribe join core i2
i1> snapctl tribe join core i3
i1> snapctl tribe join core i4
Let’s imagine that i1 and i2 are somehow special and should have additional
plugins loaded and tasks running beyond what is defined by the core tribe.
i1> snapctl tribe create storage
i1> snapctl tribe join storage i1
i1> snapctl tribe join storage i2
On our core tribe we will load an influxdb and psutil plugin and start a task
capturing basic OS utilization details. On our storage tribe we will
load the smart plugin and start a task capturing disk IO.
i4> snapctl plugin load influxdb --tribe core
i4> snapctl plugin load psutil --tribe core
i4> snapctl task create -t psutil-influx.json --tribe core
i4> snapctl plugin load smart --tribe storage
i4> snapctl plugin task create -t disk-io.json --tribe storage
Explicitly referring to the tribe when loading plugins or tasks reduces the risk
that a user accidentally affects the entire tribe when they perform actions that
are intended for an individual node. It also more effectively supports multiple
potentially overlapping tribes.
Dynamically adding nodes to tribes through policies
When started in tribe mode, snap will establish a list of facts collected from
the node it is running on as well as arbitrary key/value pairs provided on
startup. These facts will then be used to evaluate tribe policies. When a
policy is evaluated positively it will result in the node being added to a
tribe.
Example facts: architecture, default_ipv4, default_ipv6, devices, os_dist,
os_dist_release, os_dist_version, processor_type, processor_features, memtotal,…
Adding a policy:
i1> snapctl tribe policy create ubuntu_policy “{{os_dist}} == Ubuntu” --tribe core
i1> snapctl tribe policy create storage_policy “{{os_dist}} == Ubuntu && {{storage_tier}} == True” --tribe storage
When snap is started in tribe mode on an Ubuntu host with the policy above
configured, it will automatically join the core tribe. If it has the
fact storage_tier=True
it will also be added to the storage tribe.
Other affected components
- Global config will need to support arbitrary
facts
.
Process and publish through remote nodes (calling remote plugins)
Tribe enables the ability to reference a named tribe in the process and/or
publish portion of a task definition. When a plugin is loaded on a node that
is associated to a tribe, it will share the connection details to the global tribe
as part of its metadata. This enables each snap node in the tribe to call
remote plugins.
Other affected components
- The scheduler will need to be extended to accept a task manifest with named
tribe details - Control will need to deal with remote plugin subscriptions
Encrypt tribe messages
Protect tribe communication by supporting symmetric key encryption.
- Tribe encryption will require the encryption key when starting
- A helper for generating the encryption key will be provided (example:
snapd keygen
) - The global config will support the tribe encryption key
This looks good, but I think we should break out the encryption piece. In my mind, there are varying degrees of complexity on how this should work. This variance is dependent on how we manage the remote communication between nodes.
For example, I don't believe it to be safe to have a symmetric key sitting on the file system, if we are going to go the payload-route, i think it will require some 2-step handshaking, not unlike how snapd handles encryption between itself and plugins.
However, if these calls are going to be http based, we could just go the transport-route and use the existing https code in the API.
Very interesting. However some questions come to my mind:
- how is the global tribe created, I guess just calling the 1st snapd instance with --tribe and then other instances with --tribe-seed ?
- once a 1st subtribe has been created but no snap nodes assigned yet, would it still be possible to load plugins, tasks and workflows to it so that whenever a node joins it inherits those ?
- will it be possible to load a plugin/create a task onto a specific node and choose whether this gets propagated or not on all tribes or none ?
- will it be easily possible to extend policies (and potentially even dynamically like plugins) ?
Thanks
Good questions Olivier :)
Another question from me, related to the dynamic addition of nodes to tribes through policies. I am wondering if we could go one step further and actually dynamically create the tribe if doesn't exist. E.g. snapd is started on a node, the tribe policy adds it to tribe "foo", tribe "foo" is not known so it is created (and gossip will take care of making the other nodes aware of that tribe). In this way, the initial step of pre-creating the tribes would be unecessary. I think have a use-case where this functionality could be very useful. Would that make some sense?
Thanks.