epeters3/skplumber

Primitives should support sequential dataset fit

epeters3 opened this issue · 0 comments

The same instantiated primitive should be able to be fit on one dataset and then another. All the sklearn primitives should already support this, but the custom primitives do not. E.g. the one-hot encoder, when fit, keeps track of all the categorical columns, but when fit to a new dataset, it does not clear out all the old columns it was tracking, so things the columns the new dataset has is the union of the old dataset's columns with the new dataset's columns.

Add a test case for this by fitting on one dataset, then another, to make sure no errors occur.