epapoutsellis/StochasticCIL

SVRG update at full gradient too late?

Opened this issue · 3 comments

@epapoutsellis and i saw differences between this SVRG implementation and a test-one from myself. Checking the code

if self.svrg_iter == 0 or update_flag :
# update memory
self.update_memory(x)
if self.svrg_iter == 0:
# initialise data passes by one due to full gradient computation from update memory
self.data_passes[0] = 1.
else:
# increment by 1, since full gradient is computed again
self.data_passes.append(round(self.data_passes[-1] + 1.,4))
# allocate memory for the difference between the gradient of selected function at iterate
# and the gradient at snapshot
# this allocates at the beginning of every update_flag: is wrong
# self.stochastic_grad_difference = x * 0.
else:
# implements the (L)SVRG inner loop
self.functions[function_num].gradient(x, out=self.stoch_grad_at_iterate)
if self.store_gradients is True:
self.stoch_grad_at_iterate.sapyb(1., self.list_stored_gradients[function_num], -1., out=self.stochastic_grad_difference)
else:
self.stoch_grad_at_iterate.sapyb(1., self.functions[function_num].gradient(self.snapshot), -1., out=self.stochastic_grad_difference)
# only on gradient randomly selected is seen and appended it to data_passes
self.data_passes.append(round(self.data_passes[-1] + 1./self.num_functions,4))
# full gradient is added to the stochastic grad difference
if out is None:
return self.stochastic_grad_difference.sapyb(self.num_functions, self.full_gradient_at_snapshot, 1.)
else:
self.stochastic_grad_difference.sapyb(self.num_functions, self.full_gradient_at_snapshot, 1., out=out)

I believe that at the "update full gradient" stage, the full gradient is not is actually not used to update the image. self.stochastic_grad_difference is not changed in this case, and that as that is used as return value, it means that the return value will actually be full_gradient + previous_grad_diff * N`, which is incorrect.

(I think the fact that at the first iteration the current code works is because self.stochastic_grad_difference is initialised as zero).

Instead, I think the update_flag=True case needs to have

        if not out is None:
            out.fill(self.full_gradient_at_snapshot)
            return out
        return self.full_gradient_at_snapshot

pinging @jakobsj @zeljkozeljko @MargaretDuff for confirmation

@KrisThielemans - do you want to try https://github.com/MargaretDuff/CIL-margaret/tree/svrg. This is the latest branch and the one we will (hopefully) be merging into CIL

That one doesn't have the above logic problem as far as I can see. I have a few minor comments on that function. Where/how do you want me to comment on it?