szilard/benchm-ml

sklearn using sparse data representation

szilard opened this issue · 2 comments

I know from @glouppe that "RFs in sklearn now support sparse matrices too"
https://twitter.com/glouppe/status/660012865554903040

It would be interesting to see the results with sparse for RF and for logistic regression too. We should see lower memory footprint and perhaps faster runs. Anyone wants to help w the code (PR)?

Good guess but maybe cruel reality, sparse matrices can reduce a lot of memory using, but No significant speedup... sklearn depends on scipy, if wanna try:
in 2-rf/2.py, using http://docs.scipy.org/doc/scipy/reference/sparse.html instead of pandas to create the the training matrix.

Yeah, scipy's sparse is what I was thinking/hoping someone can take a look. You could try this simplified setup https://github.com/szilard/benchm-ml/tree/master/z-other-tools with the initial python code here https://github.com/szilard/benchm-ml/blob/master/z-other-tools/2.py You could time this and also sparse and submit results here/PR.