feat: allow extracting pipeline meta-information for evaluation

Question

feat: allow extracting pipeline meta-information for evaluation

Closed this issue 8 months ago · 1 comments

Behaviour:
We need to easily get the pipeline when evaluating. One way of doing that is saving to disk at a consistent location. Another is to just log it as an artifact, which seems nice!

Log artifacts to disk

This means it needs to be able to log artifacts in some way.

For that to happen, we need some general way of saving to disk?

Currently, MLFlow takes a file and just uploads that. Our other loggers can do that as well. Then the client code must handle how to write to disk.

Ensure logging of pipeline

Ensure all tasks have a .pipe attribute, for example via multiple-inheritance of a shared protocol
Write the sklearn-pipe to disk as .pkl

Required changes

Refactor trainers to dataclasses
Add log_artifact to:
- BaselineLogger
- TerminalLogger
- MLFlowLogger
- DiskLogger
Add a .sklearn_pipeline property to BinaryClassificationPipeline and MultilabelClassificationPipeline (shared inheritance of a protocol)
- Will be at task.pipe.sklearn_pipeline
Add logging of the sklearn_pipe trainers to trainers
Add _log_main_metric and _log_sklearn_pipe methods to the trainer protocols

Answer 1 · 2024-01-16T08:48:57.000Z

Overlap with #523