SVRG update at full gradient too late?

@epapoutsellis and i saw differences between this SVRG implementation and a test-one from myself. Checking the code

StochasticCIL/Wrappers/Python/cil/optimisation/functions/SVRGFunction.py

Lines 81 to 114 in f9f67d5

    
           if self.svrg_iter == 0 or update_flag : 
        
               # update memory 
        
               self.update_memory(x)         
        
               if self.svrg_iter == 0: 
        
                   # initialise data passes by one due to full gradient computation from update memory 
        
                   self.data_passes[0] = 1. 
        
               else: 
        
                   # increment by 1, since full gradient is computed again 
        
                   self.data_passes.append(round(self.data_passes[-1] + 1.,4)) 
        
               # allocate memory for the difference between the gradient of selected function at iterate  
        
               # and the gradient at snapshot 
        
               # this allocates at the beginning of every update_flag: is wrong 
        
               # self.stochastic_grad_difference = x * 0.   
        
           else: 
        
               # implements the (L)SVRG inner loop             
        
               self.functions[function_num].gradient(x, out=self.stoch_grad_at_iterate) 
        
               if self.store_gradients is True: 
        
                   self.stoch_grad_at_iterate.sapyb(1., self.list_stored_gradients[function_num], -1., out=self.stochastic_grad_difference)          
        
               else: 
        
                   self.stoch_grad_at_iterate.sapyb(1., self.functions[function_num].gradient(self.snapshot), -1., out=self.stochastic_grad_difference)          
        
               # only on gradient randomly selected is seen and appended it to data_passes     
        
               self.data_passes.append(round(self.data_passes[-1] + 1./self.num_functions,4)) 
        
           # full gradient is added to the stochastic grad difference  
        
           if out is None: 
        
               return self.stochastic_grad_difference.sapyb(self.num_functions, self.full_gradient_at_snapshot, 1.) 
        
           else: 
        
               self.stochastic_grad_difference.sapyb(self.num_functions, self.full_gradient_at_snapshot, 1., out=out)

I believe that at the "update full gradient" stage, the full gradient is not is actually not used to update the image. self.stochastic_grad_difference is not changed in this case, and that as that is used as return value, it means that the return value will actually be full_gradient + previous_grad_diff * N`, which is incorrect.

(I think the fact that at the first iteration the current code works is because self.stochastic_grad_difference is initialised as zero).

Instead, I think the update_flag=True case needs to have

        if not out is None:
            out.fill(self.full_gradient_at_snapshot)
            return out
        return self.full_gradient_at_snapshot

pinging @jakobsj @zeljkozeljko @MargaretDuff for confirmation

@KrisThielemans - do you want to try https://github.com/MargaretDuff/CIL-margaret/tree/svrg. This is the latest branch and the one we will (hopefully) be merging into CIL

That one doesn't have the above logic problem as far as I can see. I have a few minor comments on that function. Where/how do you want me to comment on it?

Still do not understand the problem and if it is a bug but I trust @KrisThielemans . Need to test if it passes cvxpy tests,
https://github.com/epapoutsellis/StochasticCIL/blob/svrg/Wrappers/Python/test/test_SVRGFunction.py
https://github.com/epapoutsellis/StochasticCIL/blob/svrg/Wrappers/Python/test/test_LSVRGFunction.py

	if self.svrg_iter == 0 or update_flag :

	# update memory
	self.update_memory(x)

	if self.svrg_iter == 0:
	# initialise data passes by one due to full gradient computation from update memory
	self.data_passes[0] = 1.
	else:
	# increment by 1, since full gradient is computed again
	self.data_passes.append(round(self.data_passes[-1] + 1.,4))

	# allocate memory for the difference between the gradient of selected function at iterate
	# and the gradient at snapshot
	# this allocates at the beginning of every update_flag: is wrong
	# self.stochastic_grad_difference = x * 0.

	else:
	# implements the (L)SVRG inner loop
	self.functions[function_num].gradient(x, out=self.stoch_grad_at_iterate)

	if self.store_gradients is True:
	self.stoch_grad_at_iterate.sapyb(1., self.list_stored_gradients[function_num], -1., out=self.stochastic_grad_difference)
	else:
	self.stoch_grad_at_iterate.sapyb(1., self.functions[function_num].gradient(self.snapshot), -1., out=self.stochastic_grad_difference)

	# only on gradient randomly selected is seen and appended it to data_passes
	self.data_passes.append(round(self.data_passes[-1] + 1./self.num_functions,4))

	# full gradient is added to the stochastic grad difference
	if out is None:
	return self.stochastic_grad_difference.sapyb(self.num_functions, self.full_gradient_at_snapshot, 1.)
	else:
	self.stochastic_grad_difference.sapyb(self.num_functions, self.full_gradient_at_snapshot, 1., out=out)