hackingmaterials/automatminer

More pipeline diagnostics

Closed this issue · 4 comments

Besides #238, I think the diagnostics into a fitted pipe could be further improved. In particular, it's too difficult to determine which model actually performed best.

I agree, it could definitely be organized better.

If you're just interested in the underlying tpot model, you can get it with:

pipe.learner.best_pipeline

If you're interested in the best "entire" pipeline in terms of going from material object to prediction (including featurization, cleaning, reduction, learning), that is a bit more difficult, because the fitted matpipe is the best pipeline lol.

My thoughts are to either add another method which only returns the most important information. E.g., which featurizers were used, what are the cleaning rules generally, what is the best autoML pipeline, etc.

My thoughts are to either add another method which only returns the most important information. E.g., which featurizers were used, what are the cleaning rules generally, what is the best autoML pipeline, etc.

I think that would be nice!

It took me some time to discover that pipe.learner.best_pipeline and pipe.learner.best_models was what I was looking for. I noticed, however, that these aren't available on saved and loaded pipes.

In the case of tpot pipelines saved and loaded, you are correct, because pickling tpot objects doesn't work last time I checked (may have been updated though). Current behavior is to select the best pipeline from the tpot object and save that single sklearn Pipeline as the backend (similar to a SinglePipelineAdaptor learner object). So the entire backend becomes the "best pipeline" and unfortunately, all the other, previously tried models are lost :/

Tl;dr: you can open up the best pipeline from a loaded (toot-backend) pipe using:

pipe.learner.backend

Only the best pipeline is saved. The best_models is not saved.

I've opened an issue addressing this #241

related to #221