som-shahlab/femr

Count featurization on pre-existing splits

scottfleming opened this issue · 1 comments

Is your feature request related to a problem? Please describe.
The featurizer API for count featurization is such that it's not clear how to impose an existing feature structure from one dataset onto another.

Describe the solution you'd like
Either make clear in the documentation that there is a "sharp edged" pattern here where you always have to featurize everything together and then split, or create a way to pass a certain data matrix column spec to a new featurization.

Or, if we want to keep the current count featurization API, we should at least implement a method whereby one can easily combine two LabeledPatients objects with +