mitre/menelaus

Design `nlp` submodule and skeleton of drift detection methods

Opened this issue · 5 comments

Task

NLP-based drift detection algorithms do not always fit into data-drift or concept-drift definitions, so a separate submodule can be made and a basic skeleton of a language or text-based algorithm can be made.

Impact

This makes implementing specific algorithms later on easier.

Worth thinking about returning to the pipeline idea:

class NLPMethod():
    def run():
        self = pipe(self, *self.operators)

n = NLPMethod(operators=[sklearn.some_preprocessor, transformers.some_transformer, some_encoder, some_evaluator])

Also worth exploring multi-threading / HPC / GPU compatibility here. If adopting a pipeline approach, we may have several operators applied to the same data at a given stage, which is a good opportunity to demonstrate potential performance enhancements. We can use MD3 as a starting point

Also worth looking at iterators

  • below step ~ next(iter)
class FreeDetector():
     def step(inputs):
         data = pipe(*self.data, some_operators)
         state = # ... pipeline of operators e.g. divergence metrics ...
         self.state = state
 
    def run():
        while data:
            self.step()

The above can help simplify a joint interface for batch vs. stream data, and can be made relevant for NLP and other methods