This package uses a numerical trick to perform the operations of torch.nn.functional.unfold
and torch.nn.Unfold
, also known as im2col
. It extends them to higher-dimensional inputs that are currently not supported.
From the PyTorch docs:
Currently, only 4-D input tensors (batched image-like tensors) are supported.
unfoldNd
implements the operation for 3d and 5d inputs and shows good performance.
—
News:
- [2021-05-02 Sun]:
unfoldNd
now also generalizes thefold
operation (col2im
) to 3d/4d/5d inputs
pip install --user unfoldNd
This package offers the following main functionality:
unfoldNd.unfoldNd
- Like
torch.nn.functional.unfold
, but supports 3d, 4d, and 5d inputs. unfoldNd.UnfoldNd
- Like
torch.nn.Unfold
, but supports 3d, 4d, and 5d inputs.
Turned out the multi-dimensional generalization of torch.nn.functional.unfold
can be used to generalize torch.nn.functional.fold
,
exposed through
unfoldNd.foldNd
- Like
torch.nn.functional.fold
, but supports 3d, 4d, and 5d inputs. unfoldNd.FoldNd
- Like
torch.nn.Fold
, but supports 3d, 4d, and 5d inputs.
Keep in mind that, while tested, this feature is not benchmarked. However, sane performance can be expected, as it relies on N-dimensional unfold (benchmarked) and torch.scatter_add
.
TL;DR: If you are willing to sacrifice a bit of RAM, you can get decent speedups with unfoldNd
over torch.nn.Unfold
in both the forward
and backward
operations.
—
There is a continuous benchmark comparing the forward pass (and forward-backward pass) run time and peak memory here. The settings are:
- “example”
- Configuration used in the example.
- “allcnnc-conv{1,2,3,4,6,7,8}”
- Convolution layers from the All-CNNC on CIFAR-100 with batch size 256, borrowed from DeepOBS. Layers 5 and 9 have been removed because they are identical to others in terms of input/output shapes and hyperparameters.
This is a reasonably large setting where one may want to compute the unfolded input, e.g. for the KFAC approximation.
The machine running the benchmark has 32GB of RAM with components
cpu
: Intel® Core™ i7-8700K CPU @ 3.70GHz × 12cuda
: GeForce RTX 2080 Ti (11GB)
- Forward pass:
unfoldNd
is faster thantorch.nn.Unfold
in all, except one, benchmarks. The latest commit run time is compared here on GPU, and here on CPU. - Forward-backward pass:
unfoldNd
is faster thantorch.nn.Unfold
in all benchmarks. The latest commit run time is compared here on GPU, and here on CPU. - Higher peak memory: The one-hot convolution approach used by
unfoldNd
consistently reaches higher peak memory (see here). The difference totorch.nn.Unfold
is higher than the one-hot kernel storage; probably the underlying convolution requires additional memory (not confirmed).
Convolutions can be expressed as matrix-matrix multiplication between two objects; a matrix-view of the kernel and the unfolded input. The latter results from stacking all elements of the input that overlap with the kernel in one convolution step into a matrix. This perspective is sometimes helpful because it allows treating convolutions similar to linear layers.
Extracting the input elements that overlap with the kernel can be done by a one-hot kernel of the same dimension, and using group convolutions.
This is an incomplete list where the unfolded input may be useful:
- It has been used for developing second-order optimization methods in deep learning by approximating the Fisher with Kronecker factors. See A Kronecker-factored approximate Fisher matrix for convolution layers.
- I’ve used the similarity between linear and convolutional layers to implement some automatic differentiation operations for the latter in BackPACK.
Encountered a problem? Open an issue here.