mitre/menelaus

Can we sync/handle burn-ins for streaming ensembles?

Anmol-Srivastava opened this issue · 4 comments

See callable class evaluator idea in #101

Having classes for evaluators might help us move towards storage of object states over time -- which help with this issue. Perhaps these classes should also be called Election or something easier to understand when initializing an ensemble, which can have a default election scheme, as well.

Making a note here, that if we switch to functions/classes for the evaluators, probably want to drop the EVALUATORS global, so the import will look like from menelaus.ensemble import Ensemble, eval_confirmed_approval (or similar) rather than from menelaus.ensemble import Ensemble, EVALUATORS.

The two cases from literature which it'd be good to have handled were Maciel 2015 and Du 2014.
Du:

  • On a second read, I think the elaboration is more to do with which detectors are chosen for the ensemble, based on their performance in one or more reference datasets. The core loop is -- if any of the members alarm, reset everything and retrain. The minimum_approval evaluator handles that case.

Maciel:

  • The idea here is, if a detector alarms, then it waits for a certain number of samples for confirmation from another detector. In the meantime, it resets and starts "retraining," and its output is ignored. If another detector fires within the window, the whole ensemble enters the drift state.
  • The wait time could be smaller than the burn-in/window size/etc. for member detectors, so we'll need to put a warning in the docstring.
  • The example from the original paper makes use of detectors which we don't currently have implemented. Maybe worth a TODO later.
  • The novelty here, and the motivation for the class evaluators over functions, is that the Evaluator needs to remember its own state in between updates - it's not a strict function of the detector list.

I wrote something up that I think does most of what's needed for the Maciel case. Probably can shuffle the references to the detector list between init and call. Need to do another pass to make sure its behavior should be equivalent to the paper.

class MacEvaluator():
	def __init__(self, sens, wait_time, detectors)
		self.waiting_counters = [0 for d in detectors]
		#list of the same size as the detector list/dict 
		# this needs to persist between iterations
		self.sens = sens

	def __call__(self, detectors):
		num_drift = 0
		num_warning = 0
		states = [d.drift_state for d in detectors]
		for i, state in enumerate(states):
			if state is "drift" and self.waiting_counters[i] == 0: #if detector i is in a "waiting" state, its current response should be ignored
				num_drift += 1
				waiting_counters[i] += 1
			elif detector.drift_state is "warning":
				num_warning += 1
			elif self.waiting_counters[i] != 0: #if the detector is in a "waiting" state, increment its counter
				self.num_drift += 1 # waiting detectors should count toward this total
				self.waiting_counters[i] += 1

		if num_drift >= self.sens:
			result = 'drift'
		elif num_warning + num_drift >= self.sens:
			result = "warning"
		else: 
			result = None

		#see if any of the detectors have fallen out of confirmation
		#this may be better framed as a "reset" method. whatever
		#double-check that this occurs in the right place, as we have to do it post-update and the paper does it pre-update
		for i, count in enumerate(waiting_counters):
			if count > wait_time:
				self.waiting_counters[i] = 0

		return result

Also, I really like the idea of calling them Elections.

Much later if we come across the issue of an election needing information about the ensemble properties, there is precedent I've seen in other work for passing that along:

class A():
    def __init__(self, rules_to_execute):
        self.rules_to_execute = rules_to_execute

    def compute(self):
        for rule in self.rules_to_execute:
            output = rule(self)
            # store/process output

There's also an SO answer describing passing references around of the outer object.

There is no problem with that whatsoever - self is an object like any other and may be used in any context where object of its type/behavior would be welcome.

In Python, as exemplified by the standard library, instances of self get passed to functions (and also to methods, and even operators) all the time.

(but no need to commit to that yet)