Aarhus-Psychiatry-Research/psycop-common

feat: allow extracting pipeline meta-information for evaluation

Closed this issue · 1 comments

Behaviour:
We need to easily get the pipeline when evaluating. One way of doing that is saving to disk at a consistent location. Another is to just log it as an artifact, which seems nice!

Log artifacts to disk

This means it needs to be able to log artifacts in some way.

  • For that to happen, we need some general way of saving to disk?

Currently, MLFlow takes a file and just uploads that. Our other loggers can do that as well. Then the client code must handle how to write to disk.

Ensure logging of pipeline

  • Ensure all tasks have a .pipe attribute, for example via multiple-inheritance of a shared protocol
  • Write the sklearn-pipe to disk as .pkl

Required changes

  • Refactor trainers to dataclasses

  • Add log_artifact to:

    • BaselineLogger
    • TerminalLogger
    • MLFlowLogger
    • DiskLogger
  • Add a .sklearn_pipeline property to BinaryClassificationPipeline and MultilabelClassificationPipeline (shared inheritance of a protocol)

    • Will be at task.pipe.sklearn_pipeline
  • Add logging of the sklearn_pipe trainers to trainers

  • Add _log_main_metric and _log_sklearn_pipe methods to the trainer protocols

Overlap with #523