InfoNCE_Loss skipping

In your paper the following is stated:

For each patch xi,j in row i and column j of this grid, we predict up to K patches xi+K,j in the rows underneath, skipping the first overlapping patch xi+1,j

However, in the implementation of InfoNCE_Loss, skip_step is applied for all k predictions. This has meant that xi+k,j is being skipped for all k instead of only when k is 1. So nearby non-overlapping patches are also being skipped when k > 1.

Greedy_InfoMax/GreedyInfoMax/vision/models/InfoNCE_Loss.py

Lines 52 to 57 in 8f91dc2

    
           for k in range(1, self.k_predictions + 1): 
        
               ### compute log f(c_t, x_{t+k}) = z^T_{t+k} W_k c_t 
        
               # compute z^T_{t+k} W_k: 
        
               ztwk = ( 
        
                   self.W_k[k - 1] 
        
                   .forward(z[:, :, (k + skip_step) :, :])  # Bx, C , H , W

In simply changing skip_step to 0 after the first iteration I have seen an improvement. I haven't run it for long enough to compare this improvement to the results stated in your paper.

Thank you for submitting an issue. I looked into it, and I believe the current implementation is correct. Let me know if you feel I misunderstood your point.

The idea is that you want to predict k={1,...,K} steps into the future. Now, for skipping the first k, we add skip_step=1 to each k, resulting in k={2,...,K+1} (and thus skipping k=1). As far as I understand, your proposed solution would result in k={2,2,...,K} instead - this will lead to a smaller loss during training (since predicting smaller k's is easier), but will probably harm downstream performance.

Sorry, everything you have said is correct, accidental miscomprehension on my part.

Although it is curious that for the task I applied it on accidentally doing k={2,2,...,K} increased downstream classification performance. Possibly predictions too far away detract from performance, or increased weighting for closer predictions increases downstream performance. Regardless, there is no issue.

	for k in range(1, self.k_predictions + 1):
	### compute log f(c_t, x_{t+k}) = z^T_{t+k} W_k c_t
	# compute z^T_{t+k} W_k:
	ztwk = (
	self.W_k[k - 1]
	.forward(z[:, :, (k + skip_step) :, :]) # Bx, C , H , W