
Python implementation of softmax loss layer

biprajiman opened this issue · 1 comments


I am trying to implement the softmaxloss layer in python to be used with pycaffe. I followed the example of euclidean loss and created a simple code as a starting point:


class SoftmaxLossLayer(caffe.Layer):

def setup(self, bottom, top):
    # check input pair
    if len(bottom) != 2:
        raise Exception("Need two inputs to compute distance.")

def reshape(self, bottom, top):
    # check input dimensions match
    if bottom[0].num != bottom[1].num:
        raise Exception("Inputs must have the same dimension.")
    #raise Exception("Inputs must have the same dimension.")
    # difference is shape of inputs
    self.diff = np.zeros_like(bottom[0].data, dtype=np.float32)
    # loss output is scalar

def forward(self, bottom, top):
    scores = bottom[0].data
    exp_scores = np.exp(scores)
    probs = exp_scores / np.sum(exp_scores, axis=1, keepdims=True) 
    correct_logprobs = -np.log(probs[range(bottom[0].num),np.array(bottom[1].data,dtype=np.uint16)])
    data_loss = np.sum(correct_logprobs)/bottom[0].num

    self.diff[...] = probs
    top[0].data[...] = data_loss

def backward(self, top, propagate_down, bottom):
    delta = self.diff

    for i in range(2):
        if not propagate_down[i]:
        if i == 0:
            delta[range(bottom[0].num), np.array(bottom[1].data,dtype=np.uint16)] -= 1

        bottom[i].diff[...] = delta/bottom[0].num


The code is working for simple LeNet and loss seems to be decreasing. I would be willing to modify this code and make it upto the standard and share. I need guidance on what am I missing (I read the c++ code and this one is far from what the c++ is doing) and modify the code to match the c++ code so that it is more generic.

You may ask why to go through this trouble, well modifying python code to create new loss is easier for me than to go through the c++ code which might take long time.

Thank you in advance for any help.

While a python layer is nice for academic/learning purposes, there's no need for it in caffe since the C++ one is faster and uses the GPU.

Also note that your forward expression is numerically unstable; you should look into lectures explaining Softmax (e.g. to see how to fix it.

I'm closing this since it's a modeling/usage question. Please continue the discussion on the mailing list.

Please do not post usage, installation, or modeling questions, or other requests for help to Issues.
Use the caffe-users list instead. This helps developers maintain a clear, uncluttered, and efficient view of the state of Caffe.