Cache classifier?
arogozhnikov opened this issue · 1 comments
arogozhnikov commented
Proposal: add cache classifier for researches that require heavy computations.
Interface
clf = CacheClassifier(name='stage_1', base_estimator=XGBoostClassifier(...))
clf.fit(X, y, sample_weight)
clf.predict(...)
All the methods are proxied to initial classifier (XGBoostClassifier
in this case).
Copy of trained classifier is saved at .rep_cache/stage_1.pkl
, together with hash of dataset.
The next time notebook is executed, if we have the same parameters of classifier and the same value of dataset hash, fit method only loads already-trained estimator.
There are many possible caveats, first think of handling clone and pickle. Those are not trivial.
arogozhnikov commented
added caches in 9d35115
'Yes' to fast rerunning of analysis.