Count featurization on pre-existing splits
scottfleming opened this issue · 1 comments
scottfleming commented
Is your feature request related to a problem? Please describe.
The featurizer
API for count featurization is such that it's not clear how to impose an existing feature structure from one dataset onto another.
Describe the solution you'd like
Either make clear in the documentation that there is a "sharp edged" pattern here where you always have to featurize everything together and then split, or create a way to pass a certain data matrix column spec to a new featurization.
scottfleming commented
Or, if we want to keep the current count featurization API, we should at least implement a method whereby one can easily combine two LabeledPatients
objects with +