Using the loss() method for the Logistic Regression example raises an IndexError
Opened this issue · 3 comments
Thank you so much for making this module! We're using it for work on a genomics project. I was interested in calculating loss(X, c) (using the same variable names in the Logistic Regression example) but am getting this error:
IndexError: boolean index did not match indexed array along dimension 0; dimension is 719 but corresponding boolean dimension is 720
While on the subject of losses, what is the difference between the .loss() method and the .losses_ attribute (I assume this is loss as a function of number of FISTA iterations until convergence)?
Thanks!
Oh, to easily deal with the intercept, I simply padded the data matrix with a column of ones. As for the losses_
attribute, that is computed during training if the LogisticGroupLasso.LOG_LOSSES
flag is set to True
.
I will write some more about this later when I have the time. Though for best performance, I do recommend using the group lasso estimator in a pipeline like described in this example.
That works too, we're exploring both options (using it as an estimator and using it as a transformer/variable selection tool). In that case, can I still use the loss() method on the training data? What would be a working example of its use?
Also, I am assuming the unregularized loss function is the same as here (i.e. cross entropy loss) when LogisticGroupLasso.LOG_LOSSES
is set to True
?
LOG_LOSSES specifies that the loss per iteration should be stored in a list, which is very useful for debugging.
For LogisticGroupLasso, I always use the overparametrised softmax formulation, which should be equivalent to the sigmoidal cross entropy loss in the binary classification problem.