intelsdi-x/snap

RFC: Redux on Plugin Selection (Routing) and Caching

pittma opened this issue · 5 comments

Introduction

When a task is started in snap, it begins by subscribing to a plugin pool. This pool holds references to running instances of a given plugin. If a subscription results in the need for a new plugin to be started, Snap's plugin runner starts one. We call this need eligibility. How plugins are chosen inside this pool we call routing.

In this specification I propose a redux on the routing module in Snap. This rethinking is predicated on the use cases enumerated in this document. The existing implementation of plugin routing in Snap is not well suited for these use cases, so the result would be an almost complete overhaul of plugin selection in Snap.

Prerequisite Changes

Snap

Right now, the apPool type is the first class object when selecting a plugin. I propose that a new type (router?, selector?), be the top-level type, and maintain the responsibilities of the current apPool: pooling, subscriptions, eligibility, and routing. Modularizing the entire pool rather than just the router provides the necessary tools needed for a good implementation. I propose that this implementation requires all of this state to make intelligent selection decisions.

Plugins

A plugin writer adds a new plugin.Meta key to their plugin's metadata called routing. This key contains a value from an enum defined in the plugin package. When a plugin is loaded its routing metadatum is applied to the selector for this plugin.

In order for the router to have access to the state it needs to correctly select the target plugin, new information will also have to be passed into the calls for Collect, Process, and Publish. I propose that a new parameter be added to these function signatures: taskId string.

Cache

Related to plugin routing is caching. Right now, the caching is contained in the client and caches metrics on a per metric namespace and version basis with a global expiration.

In order for caching to be correct and effective it should be moved within the new type described here. Further the call to this new type should expose CollectMetrics and not expose the running plugin or cache. With the cache being moved to this new proposed type a plugin author could expose a minimum cache TTL and snap will instantiate the appropriate cache type based on the routing strategy described here (per plugin type and version, per running plugin or per config).

Use cases

Case 1: The one-to-one (sticky)

Description

A task needs to return to the same plugin each time, say a processor which is computing moving averages. These MAs should not be polluted by other tasks data. Since the implementation of the plugin is not capable of differentiating between one task and another, that task should have exclusive access to this plugin.

Proposed Implementation

A plugin writer sets their routing value to plugin.StickyRouting. The task's ID can be used as a key for the plugin. If the task is new, a new plugin is started. If the task is new and it would cause the max pool size to be exceeded, an error is returned on task creation. This error should come from an eligible call to the routing implementation, and bubble all the way back up to API caller.

Limitations

Snap's goal in maintaining the smallest possible footprint has many facets, one of which is the upper bound on the number of plugin instances in a given pool. With a sticky router, the implementation must choose to either fail the creation of a task if this upper bound is exceeded, or exempt a selector which is of type sticky from this upper bound all together. I propose the former.

Case 2: The config based router

Description

Take a publisher which opens, and then maintains a connection to RabbitMQ. Tasks which share configuration data, like RabbitMQ node, credentials, and even queue/exchange data could share a plugin instance.

Proposed Implementation

A plugin writer sets their routing value to plugin.ConfigBasedRouting. This could be achieved by hashing the config data, and then using that hash as a key for selecting a plugin instance. This results in quite similar behavior to the sticky router described above, however it adds the advantage of avoiding the special exceptions for exceeding the max pool size.

Limitations

If a plugin does require differentiating between tasks, the config-based router puts the onus on the plugin author to implement this logic.

Looks great @danielscottt.

As a related item, how do you feel about a separate feature that allows overriding max-running-plugin per plugin through the global config?

This is really good.

Added Cache section to the spec.

I would like to implement Case 2: The config based router. I analyzed current source code and tried to figure out how to implement that, but I think I don't quite understand one thing. @danielscottt suggested to hash config data to achieve proper config based routing but I can't figure out easy way to get config data when inserting new plugin to pool.
Pools method Insert(a AvailablePlugin)operates on AvailablePlugin type which does not store config data. Is there any way to achieve that without modifying AvailablePlugin interface? Or maybe I'm missing something trivial here?

Hi Marcin,

You could use incoming config on either collect, process, or publish calls. Process and Publish take config now, and collect would need to be updated to take a config.

The table inside the config-based router could be a map with where, on the left (the keys) could be the whole config, gob encoded, and then converted into a string, and on the right, the plugin which matches that config. Looking at the strategy as it exists now, I think an abstraction to the []SelectablePlugin type might be nice; something that is essentially iterable, so it could either be k/v or indexed.

Given that abstraction, I think the config-based strategy would also need a custom pool, which is building and storing the table as plugins are added, mainly because pool eligibility (does the pool need a new running plugin for this task) is dependent on whether or not there is a plugin available to serve the config-based request.

This change is fairly sweeping to the strategy package and the pieces of available_plugin.go which touch that package, but I think it's an awesome idea to go ahead and start work on this. It'll force us to think about the right abstractions for strategies, as this'll be the first one that will really have a custom implementation of these abstractions!