feat: allow extracting pipeline meta-information for evaluation
Closed this issue · 1 comments
MartinBernstorff commented
Behaviour:
We need to easily get the pipeline when evaluating. One way of doing that is saving to disk at a consistent location. Another is to just log it as an artifact, which seems nice!
Log artifacts to disk
This means it needs to be able to log artifacts in some way.
- For that to happen, we need some general way of saving to disk?
Currently, MLFlow takes a file and just uploads that. Our other loggers can do that as well. Then the client code must handle how to write to disk.
Ensure logging of pipeline
- Ensure all tasks have a .pipe attribute, for example via multiple-inheritance of a shared protocol
- Write the sklearn-pipe to disk as .pkl
Required changes
-
Refactor trainers to dataclasses
-
Add log_artifact to:
- BaselineLogger
- TerminalLogger
- MLFlowLogger
- DiskLogger
-
Add a
.sklearn_pipeline
property to BinaryClassificationPipeline and MultilabelClassificationPipeline (shared inheritance of a protocol)- Will be at task.pipe.sklearn_pipeline
-
Add logging of the sklearn_pipe trainers to trainers
-
Add
_log_main_metric
and_log_sklearn_pipe
methods to the trainer protocols