Using the loss() method for the Logistic Regression example raises an IndexError

Question

Using the loss() method for the Logistic Regression example raises an IndexError

Opened this issue 5 years ago · 3 comments

Thank you so much for making this module! We're using it for work on a genomics project. I was interested in calculating loss(X, c) (using the same variable names in the Logistic Regression example) but am getting this error:

IndexError: boolean index did not match indexed array along dimension 0; dimension is 719 but corresponding boolean dimension is 720

While on the subject of losses, what is the difference between the .loss() method and the .losses_ attribute (I assume this is loss as a function of number of FISTA iterations until convergence)?

Thanks!

Answer 1 · 2020-04-17T15:30:41.000Z

Oh, to easily deal with the intercept, I simply padded the data matrix with a column of ones. As for the losses_ attribute, that is computed during training if the LogisticGroupLasso.LOG_LOSSES flag is set to True.

I will write some more about this later when I have the time. Though for best performance, I do recommend using the group lasso estimator in a pipeline like described in this example.

Answer 2 · 2020-04-17T16:30:46.000Z

That works too, we're exploring both options (using it as an estimator and using it as a transformer/variable selection tool). In that case, can I still use the loss() method on the training data? What would be a working example of its use?

Also, I am assuming the unregularized loss function is the same as here (i.e. cross entropy loss) when LogisticGroupLasso.LOG_LOSSES is set to True?

Answer 3 · 2021-02-04T10:50:56.000Z

LOG_LOSSES specifies that the loss per iteration should be stored in a list, which is very useful for debugging.

For LogisticGroupLasso, I always use the overparametrised softmax formulation, which should be equivalent to the sigmoidal cross entropy loss in the binary classification problem.