alex-pirozhenko/sklearn-pmml

PMML Import

Opened this issue · 5 comments

Full persistence of models requires PMML import, particularly for my use case.

Every morning, I have a task that compiles features, loads the model, predicts the features, and sends off the predictions.

While this could be handled with a service by training the model and then maintaining the estimator in memory, a failure means retraining the model. Supplying the hyperparameters to the estimator won't work.

Full PMML import/export is necessary in my case, unless I can re-engineer the process. I'm not sure there's much to be gained by this, so it's a tough sell internally.

Just to clarify - does the training flow in your case use a different stack (anything different from sklearn)? Otherwise, you could simply serialize your pre-trained model with pickle for further use, it's much more convenient than PMML.

At the same time I agree that PMML->sklearn import would be a valuable feature to this project. We're open to PRs.

I’m using joblib.load right now. It seems to work well, but I do know there are problems serializing models from python.

I would like to be able to train models in scikit-learn and then import them into Spark. That's my ultimate workflow. Mllib will not currently import PMML.

Everybody exports to PMML (including R), but few systems will import PMML. This defeats the entire purpose of the standard, in my opinion.

If you're just using spark to evaluate a model trained it sklearn, you could use the jpmml evaluator, I've used it with scalding and pig on several occasions. This would require using the scala or java spark API's though if I was using python I've probably just use pickle.

Hmm, apparently my future problems will just go away. Thanks @DarinJ for the info.

We have no plans to use Spark. So PMML import is back on the table for me.