cornellius-gp/linear_operator

Symmetric matrices and convolution operators

sebastienwood opened this issue · 3 comments

Hi !
I'm interested in the library to reduce the memory and computationnal complexity of computing covariance matrices of a convolutional operation. The covariance is symmetric by construction, which would approximately divide the burden by half. The result may be consumed by other convolutionnal layers/other operations, or in some cases only the diagonal may be kept (which should in turn reduce the burden to square root of the full covariance).

Edit: it would actually be symmetric tensors with shape channel*width*height*channel*width*height

Is it possible with the library at the moment ? should there be concerns for performance with respect to the cuDNN implementations of convolutions ?

A possible approach I thought of involved unfolding the input, which I'm not sure is fully supported by the library/is somewhat inefficient in regular Pytorch.

it would actually be symmetric tensors with shape channel*width*height*channel*width*height

Do you mean channel x (width*height x width*height) or (channel*width*height) x (channel*width*height)? I.e. is this a covariance per channel or a full covariance across channels also?

I don't think we currently have a low-level representation of a symmetric matrix that just saves n(n-1)-n^2 elements. As you say this saves about 50% memory which is not all that much relative to e.g. when using a low rank matrix or the like.

should there be concerns for performance with respect to the cuDNN implementations of convolutions ?

So you want to run this on cuDNN directly? I don't know about this but I don't think cuDNN would be able to exploit this representation on the LinearOperator end directly in the convolution. It seems that this is something that would need to be implemented on the cuDNN end (not an expert though).

This would be the full covariance, across channels included :)
The issue with it is that it is generated from exact moments computation. Would we not need to first compute it before "extracting" a low rank representation of it ? Hence the choice to compute and store only half of it. If you have any better idea I'd be interested to discuss it more.

I thought LinearOperator used to some extent cuDNN or its sparse counterpart; since the convolution still needs to be performed I think it is quite amenable to GPU implementation "in general".

The issue with it is that it is generated from exact moments computation. Would we not need to first compute it before "extracting" a low rank representation of it ? Hence the choice to compute and store only half of it. If you have any better idea I'd be interested to discuss it more.

So the operator you care about is F @ F^T for some operator F? This is possible with this library: see for example RootLinearOperator. (Apologies for the lack of documentation for this operator; documentation is a work in progress.)

I thought LinearOperator used to some extent cuDNN or its sparse counterpart; since the convolution still needs to be performed I think it is quite amenable to GPU implementation "in general".

Your LinearOperator can use PyTorch commands that utilize cuDNN. For example, I would imagine that the _matmul function for your LinearOperator would call torch.nn.functional.conv2d or something like that, which uses cuDNN under the hood.