gokceneraslan/SparseMax.torch

Cuda enabling and Multi-label example

pdakwale opened this issue · 2 comments

Hi,
I am trying to use SparseMax utility for Multi-label classification.
I have two questions regarding it.
I am using it in my RNN architecture where it only supports cuda types. I was trying SparseMaxLoss instead of LogSoftMax in my network. But it complains of expecting DoubleTensor instead of CudaTensor. Is SparseMaxLoss available only for Double types or also Cuda supported ?

Also it is not clear to me how to exactly use it for Multi-label classification ? Is it possible to provide a test example for Multi-label classification ?

Thanks
Praveen

Since SparseMaxLoss is a subclass of nn.Modules I think you can just cast it to cuda like any other nn module.

The multi-label classification formulation described in the paper is not implemented here. You would have to tweak the implementation of SparseMaxLoss, which currently implements equations 19 and 20 of the paper, to implement equations 26 and 27. Also I don't think the formulation in the paper is as straightforward as substituting BCECriterion for ClassNLLCriterion since they define "multi-label" as a distribution over active labels and not just a binary encoding of active labels.

About the multi-label approach, equation 26 already gives you a scalar, so you don't need to pass it to any NLL function. Just remember to scale your q (multi-label target) to be real distribution e.g. using uniform scaling.

In the multi-class approach, it was passed to the NLL function because L_sparsemax is defined for z and k, hence the function will return you a matrix (for each sample and each class). Here NLL function will just select the relevant class and make it negative.