BVLC/caffe

Python manual sgd

aniketvartak opened this issue · 10 comments

I am trying to implement the SGD functionality to update weights in python manually in caffe python instead of using solver.step() function. The goal is to match the weight updates after doing solver.step() and that by manually updating the weights.

The setup is as follows: Use MNIST data. Set the random seed in solver.prototxt as: random_seed: 52. Make sure momentum: 0.0, weight_decay: 0.0 and, base_lr: 0.01, lr_policy: "fixed". This is done so that, I can simply implement the SGD update equation (with out momentum, regularization etc.). The equation is simply: W_t+1 = W_t - mu * W_t_diff

Following are the two tests:

Test1: Using pycaffe's forward() and backward() to calculate the forward propagation and backward propagation. For each layer that contain weights I do:

for k in weight_layer_idx:
    solver.net.layers[k].blobs[0].diff[...] *= lr # weights
    solver.net.layers[k].blobs[1].diff[...] *= lr # biases

Next, update the weight/biases as:

    solver.net.layers[k].blobs[0].data[...] -= solver.net.layers[k].blobs[0].diff
    solver.net.layers[k].blobs[1].data[...] -= solver.net.layers[k].blobs[1].diff

I run this for 5 iterations.

Test2: Run caffe's solver.step(5).

Now, what I expect is the two tests should yield exactly same weights after the two iterations.

I save the weights values after each of the above tests and calculate the norm difference between the weight vectors by the two tests, and I see that they are not bit-exact. Can some one spot something that I might be missing?

Following is the entire code for reference:

import caffe
caffe.set_device(0)
caffe.set_mode_gpu()
import numpy as np
niter = 5
solver = None
solver = caffe.SGDSolver('solver.prototxt')

# Automatic SGD: TEST2
solver.step(niter)
# save the weights to compare later
w_solver_step = copy(solver.net.layers[1].blobs[0].data.astype('float64'))
b_solver_step = copy(solver.net.layers[1].blobs[1].data.astype('float64'))

# Manual SGD: TEST1
solver = None
solver = caffe.SGDSolver('solver.prototxt')
lr = 0.01

# Get layer types
layer_types = []
for ll in solver.net.layers:
    layer_types.append(ll.type)

# Get the indices of layers that have weights in them
weight_layer_idx = [idx for idx,l in enumerate(layer_types) if 'Convolution' in l or 'InnerProduct' in l]

for it in range(1, niter+1):
    solver.net.forward()  # fprop
    solver.net.backward()  # bprop
    for k in weight_layer_idx:
        solver.net.layers[k].blobs[0].diff[...] *= lr
        solver.net.layers[k].blobs[1].diff[...] *= lr
        solver.net.layers[k].blobs[0].data[...] -= solver.net.layers[k].blobs[0].diff
        solver.net.layers[k].blobs[1].data[...] -= solver.net.layers[k].blobs[1].diff

# save the weights to compare later
w_fwdbwd_update = copy(solver.net.layers[1].blobs[0].data.astype('float64'))
b_fwdbwd_update = copy(solver.net.layers[1].blobs[1].data.astype('float64'))

# Compare
print "after iter", niter, ": weight diff: ", np.linalg.norm(w_solver_step - w_fwdbwd_update), "and bias diff:", np.linalg.norm(b_solver_step - b_fwdbwd_update)

The last line that compares the weights with the two tests produces:

after iter 5 : weight diff: 0.000203027766144 and bias diff: 1.78390789051e-05

where as I expect this difference to be 0.0

Any ideas?


Also note,
If I run these two tests for only 1 iteration, I get exactly matching weight vectors from all layers, but not the subsequent iterations.

It looks like you're not clearing the diff in each blob. If you want to match the C++ code, you need to clear the blobs manually before each forward/backward pass (set them all to 0): https://github.com/BVLC/caffe/blob/master/src/caffe/solver.cpp#L203

It's not cleared automatically in order to support gradient accumulation (the iter_size solver parameter).

@aniketvartak Hi, I'm trying to do something similar because I want to perform forward pass, then update some layers, then perform backpropagation, then update the weights. My data is manually loaded MNIST images and their labels. Manually loading the data is important for my eventual application. I use an Input layer for this: n.data, n.labels = L.Input(shape=[dict(dim=[64,1,28,28]), dict(dim=[64])], transform_param=dict(scale=1./255), ntop=2)

To do the learning, I am trying to perform learning without calling solver.step()

for it in range(100):
    # Manually load data - returns batch = ndarray(64,1,28,28), labels = ndarray(64)
    batch, labels = zip(*(get_random_digit() for _ in range(64)))

    # Set data into network
    solver.net.blobs['data'].data[...] = batch
    solver.net.blobs['labels'].data[...] = labels

    solver.net.forward()
    solver.net.backward()

    ## Test 2 code or Test 1 code ....

My problem is that even when I run for many iterations, no learning takes place. In fact, the first iteration will have at least different classes assigned for each image in the mini-batch. After the first iteration, all labels seem to be some random class.

At first I thought it was a problem with my inputs. So I dropped the update snippet into the LeNet example (here). Even here, accuracy stays the same over 1000 iterations, so no learning occurs.

@seanbell Any ideas?

@nathanin I think you should update weights in each blob manually. net.backward() doesn't do this for you.

        net.backward()
        # manually update
        for layer in net.layers:
            for blob in layer.blobs:
                blob.data[...] -= current_lr * blob.diff

@automan000

Hi, thanks for the input. Manually updating like this I found many problems. There were a couple ways to get around. The one I chose was to expose solver::ApplyUpdate() in the python interface. This way, the equivalent to solver.step(1) is:

solver.net.forward()
solver.net.backward()
solver.update()

Had to move the iter_ increment to inside the sgd_solver.. but I haven't really found a problem with doing that.

@nathanin Thanks for your sharing.

@nathanin
Hi, I try to link the ApplyUpdate in _caffe.cpp but this method is protected, how do you solve it?

@xiao7199 I moved it to be public in include/caffe/sgd_solvers.hpp .... careful that I don't know if this has consequences otherwise. If you find issue with this solution, please let me know. Good luck :)

mitar commented

I did changes @nathanin described above in this fork: https://github.com/mitar/caffe

Closing as this is not related to Caffe development; also the original question seems to have been answered.

mitar commented

I have opened this pull request with a fix for this issue: #6238