Design `nlp` submodule and skeleton of drift detection methods

Question

Design `nlp` submodule and skeleton of drift detection methods

Opened this issue a year ago · 5 comments

Task

NLP-based drift detection algorithms do not always fit into data-drift or concept-drift definitions, so a separate submodule can be made and a basic skeleton of a language or text-based algorithm can be made.

Impact

This makes implementing specific algorithms later on easier.

Answer 1 · 2023-08-03T15:19:05.000Z

Worth thinking about returning to the pipeline idea:

class NLPMethod():
    def run():
        self = pipe(self, *self.operators)

n = NLPMethod(operators=[sklearn.some_preprocessor, transformers.some_transformer, some_encoder, some_evaluator])

Answer 2 · 2023-08-03T15:37:57.000Z

Also worth exploring multi-threading / HPC / GPU compatibility here. If adopting a pipeline approach, we may have several operators applied to the same data at a given stage, which is a good opportunity to demonstrate potential performance enhancements. We can use MD3 as a starting point

Answer 3 · 2023-08-03T17:59:35.000Z

Also worth looking at iterators

Answer 4 · 2023-08-03T18:07:42.000Z

below step ~ next(iter)

class FreeDetector():
     def step(inputs):
         data = pipe(*self.data, some_operators)
         state = # ... pipeline of operators e.g. divergence metrics ...
         self.state = state
 
    def run():
        while data:
            self.step()

Answer 5 · 2023-08-03T18:11:15.000Z

The above can help simplify a joint interface for batch vs. stream data, and can be made relevant for NLP and other methods