maxpumperla/elephas

Use elephas to do multi-label classification

Closed this issue · 8 comments

I came across Elephas to distribute Keras models in Spark. I am using it for multi-class classification problem and it seems to work fine.

I am wondering how to use this for multi-label classification where the target variable will be a list of 0's and 1's. For example, the target variable can be [0, 0, 1, 0 ,1 0, 0 , 0 , 1, 1]. I have the Keras model which works on a pandas dataframe for multi-label classification problem. So I wanted to try it on Spark Dataframe and not able to make it work.

Can you please help me with this? Thanks in advance.

Can someone please help me with this?

Hello @SaiBeathanabhotla, what was the primary issue with doing multi-label classification? Was the model just performing poorly, or was there an actual error?

This may be related - in #177, we introduce the capability of predicting probabilities in the ElephasTransformer. Between this feature and the predict method on the SparkModel, I believe that functionally, multi-class classification should be feasible in both approaches - the only issue is that predicting the classes has to happen on the application side, since the output in both cases is an array of "probabilities".

It was an actual error. Since the target variable is multi-label (array) but Elephas was expecting a float. So I was getting an error of type mismatch.
Also, as you mentioned you have introduced the capability of predicting probabilities, I would want to try it out. Do you have a working notebook that I can try out?

It should be in the next release (1.4.0), which will be coming out later tonight or early tomorrow.

This is offically in 1.4.1 release - please try it out at earliest convenience!

@SaiBeathanabhotla have you gotten a chance to test the feature out?

I'm going to close this out for now. Feel free to reopen this ticket or open a new one if the latest version (3.0.0) doesn't work!