yandex/rep

Cache classifier?

arogozhnikov opened this issue · 1 comments

Proposal: add cache classifier for researches that require heavy computations.

Interface

clf = CacheClassifier(name='stage_1', base_estimator=XGBoostClassifier(...))
clf.fit(X, y, sample_weight)
clf.predict(...)

All the methods are proxied to initial classifier (XGBoostClassifier in this case).

Copy of trained classifier is saved at .rep_cache/stage_1.pkl, together with hash of dataset.

The next time notebook is executed, if we have the same parameters of classifier and the same value of dataset hash, fit method only loads already-trained estimator.

There are many possible caveats, first think of handling clone and pickle. Those are not trivial.

added caches in 9d35115

'Yes' to fast rerunning of analysis.