sramirez/spark-infotheoretic-feature-selection

Info-Theoretic Framework requires positive values in range [0, 255]

michaelws92 opened this issue · 1 comments

Is your algorithm not support for double or float value? Do you have suggestion if I have a very big value data like in million ? because your algorithm not support it

Hi Michael,

You can discretize your data with my package spark-MDLP. I have updated the README file in order to reflect all the information that you demand.

_LabeledPoint data must be discretized as integer values in double representation, ranging from 0 to 255. By doing so, double values can be transformed to byte directly thus making the overall selection process much more efficient (communication overhead is deeply reduced).

Please refer to the MDLP package if you need to discretize your dataset:

https://spark-packages.org/package/sramirez/spark-MDLP-discretization_