rth/pysofia

Add native support for sparse arrays in pysofia

rth opened this issue · 0 comments

rth commented

Issue history copied from fabianp#6

Currently pysofia train and test methods accepts input features either,

  • as a dense numpy array
  • stored in a lightsvm file (sparse ascii file format)

A writer/parser for lightsvm is available in sklearn, but this adds additional overhead when working with large sparse features sets. Adding the ability to natively handle numpy CSR arrays, for instance, could be useful. Sofia ML itself does handle sparse arrays as far as I understand, so it should be just a matter of adding a few more cython wrappers.

fabianp: Could you please comment on whether in your opinion this could be useful, or if at present Sofia ML - like functionality was reimplemented in other ML projects (e.g. sklearn, sklearn-contrib-lightning, ..) with sparse arrays support, and so it would be just easier to use one of those projects? Thanks.

Edit: ok,maybe I'm wrong, it looks like it is possible to pass sparse arrays as strings, but it's still not a native support of numpy sparse arrays.

fabianp:

I do think it would be useful to have that. Relying on scikit-learn's lightsvm parser seems like easiest choice, and the overhead is probably not that big.