Implement support for sparse feature data
ogrisel opened this issue · 0 comments
ogrisel commented
For instance if all the data is passed as a scipy.sparse.csc_matrix
(e.g. after one hot encoding).
Pandas as support for sparse features: http://pandas.pydata.org/pandas-docs/stable/sparse.html
In particular it has dedicated datastructure for 1D sparse data: SparseArray.
There is also: https://github.com/pydata/sparse and I believe the ecosystem will converge at some point. I would be in favor of leveraging the datastracture from Pandas to start with the most adopted solutions that allows for heterogeneously typed features (a fix of dense and sparse columns, categorical or numerical).