Design `nlp` submodule and skeleton of drift detection methods
Opened this issue · 5 comments
Task
NLP-based drift detection algorithms do not always fit into data-drift or concept-drift definitions, so a separate submodule can be made and a basic skeleton of a language or text-based algorithm can be made.
Impact
This makes implementing specific algorithms later on easier.
Worth thinking about returning to the pipeline idea:
class NLPMethod():
def run():
self = pipe(self, *self.operators)
n = NLPMethod(operators=[sklearn.some_preprocessor, transformers.some_transformer, some_encoder, some_evaluator])
Also worth exploring multi-threading / HPC / GPU compatibility here. If adopting a pipeline approach, we may have several operators applied to the same data at a given stage, which is a good opportunity to demonstrate potential performance enhancements. We can use MD3
as a starting point
Also worth looking at iterators
- below
step
~next(iter)
class FreeDetector():
def step(inputs):
data = pipe(*self.data, some_operators)
state = # ... pipeline of operators e.g. divergence metrics ...
self.state = state
def run():
while data:
self.step()
The above can help simplify a joint interface for batch vs. stream data, and can be made relevant for NLP and other methods