FastFaces

A Pytorch Implementation of a compressed Deep Learning Architecture using Low-Rank Factorization.

About

The goal is to speed up and compress a network by eliminating redundancy in the 4D tensors that serve as convolutional kernels. The proposed tensor decomposition replaces the convolutional kernel with two consecutive kernels with lower rank as is shown in the figure.

More specifically, if the convolutional layer have inputs and outputs, the main objective is to find two intermediate kernels $\mathcal{H} \in \mathbb{R}^{C \times d \times 1 \times K}$ and $\mathcal{V}\in \mathbb{R}^{K \times d \times 1 \times N}$ such that , constraining the rank of the layer to be .

The new proposed tensor has the form:

$\tilde{\mathcal{W}_{n}^{c}}=\sum_{k=1}^{K} \mathcal{H}_{n}^{k}\left(\mathcal{V}_{k}^{c}\right)^{T}$

and given that the convolution is distributive over addition, we can produce the same number offeature maps performing two independent convolutions:

$\tilde{\mathcal{W}}_{n} * \mathcal{Z}=\sum_{c=1}^{C} \sum_{k=1}^{K} \mathcal{H}_{n}^{k}\left(\mathcal{V}_{k}^{c}\right)^{T} * \mathcal{Z}^{c}=\sum_{k=1}^{K} \mathcal{H}_{n}^{k} *\left(\sum_{c=1}^{C} \mathcal{V}_{k}^{c} * \mathcal{Z}^{c}\right).$

As the main goal is to maintain the overall accuracy, it is required to keep the filter products as close as possible to the original ones. The Frobenius norm can be used to define the objective function to be minimized, leading to the problem:

$\begin{aligned} & \underset{(\mathcal H, \mathcal V)}{\text{minimize}} & & {E_{1}(\mathcal{H}, \mathcal{V}) :=\sum_{n, c}\left\|\mathcal{W}_{n}^{c}-\sum_{k=1}^{K} \mathcal{H}_{n}^{k}\left(\mathcal{V}_{k}^{c}\right)^{T}\right\|_{F}^{2}} \\ \end{aligned}$

Whic can be solved in a lower dimension space according to [1] using the following linear mapping

$\begin{align*} \mathcal{T} : \mathbb{R}^{C \times d \times d \times N} &\rightarrow \mathbb{R}^{C d \times d N} \\ (i_1, i_2,i_3,i_4) &\mapsto ((i_1-1)d+i_2, \ \: (j_4-1d+i_3)), \end{align*}$

This mapping corresponds to the alignment of all the filters together in a matrix as is shown in the picture

Finally, the SVD decomposition of the matrix is computed and the product of the first singular values and vectors will be mapped back to the original space. That is the solution to the original minimization problem. See [1] for a detailed proof.

Results.

The method was tested using the public the implementation of the Single Shot Multibox Detector (SSD) models provided by https://github.com/qfgaohao/pytorch-ssd. The method was tested using a private dataset provided by the public transportation system in Bogotá Colombia with the aim of developing an efficient head count software. We were concerned about speedup and performance change for different layers. Performance change being relative to the baseline model with accuracy of 81.05 % and a total number of 6.601.011 parameters. The values in the weights reduction column represent the percentage of the original number of values in the particular layer that is used in the comprised model.

Reference.

[1].Cheng Tai, Tong Xiao, Xiaogang Wang, and E Weinan. Convolutional neural networks with low-rank regularization. CoRR, abs/1511.06067, 2016.

millanp95/FastFaces

FastFaces

About

Results.

Reference.