Encoder improvements
Closed this issue · 1 comments
stephanegaiffas commented
Here are a few things to be done in the Encoder
:
Very important stuff
- Add the code that deals with numpy arrays (for now only pandas dataframes are dealt with)
- Handle non-category and non-numerical values as categorical
Important stuff
- Put back all numba signatures
- check is_categorical, size, dtype, etc.
- check that "categories" are for the same columns at the ones passed
- Unittests for non-numerical non-categorical columns
Mild stuff
- Finish all docstrings
- tests are missing, such as for n_features...
- check that X has the correct dtype
- keep also column and index information to rebuild the exact same
And we could do this in another PR :
- fit and transform in parallel over columns (maybe in another PR)
- use bitsets for known categories ?
- test for categories will too low modalities
- Direct numba code for this by testing the -1 directly ?
stephanegaiffas commented
Done in #95