CAS-CLab/quantized-cnn

questions about codebook

Closed this issue · 4 comments

Hi,
I still have several question after reading your code and paper.

if the FC network parameter is W (matrix), input is x (vector), so the inference should be y = Wx

  • split the W into M sub-matrices, are the M sub-matrices share the same codebook, or they have their own codebook (totol M codebooke should be stored)?
  • the test-data-set (labelled data) is needed to compute the D(m) and B(m), is it?
  • For each subspace, we compute the inner products between S(m) and every sub-codeword in D(m), and store the results in a look-up table while the S(m) is splitted input layer, whose value should be know only when we need to inference (e.g., a new picture comes to the cnn), how can we compute the inner product in advance?

I may not totally understand the method, could you please explain those questions? Thanks a lot!

  1. The M sub-matrices have their own codebook, thus M codebooks should be stored.
  2. To learn D and B, you only need unlabeled images from the training subset. The optimization process is to approximate the original network's each layer's activation, so there is no need for category labels. However, if you want to fine-tune the network, then the labelled training data is required.
  3. The inner products are only computed when you have an input image to the network. It is input-dependent. Different input images will produce different look-up tables.
  1. then there must be sth wrong in my understanding, please correct me.
    Each sub-matrix has own codebook, then just set the codeword in the codebook same with the submatrix. in another word, there is only one word in the codebook, and the word is exactly the submatrix.
    And there is M codebooks need to be stored, instead of the whole weight matrix. but does the size of M codebooks smaller than the original matrix?

  2. so the computation cost of the product should also be included in the FLOP? and why do we need the look-up table? Every input will result in different product.

Thanks very much for your prompt reply :)

  1. The codebook is a collection of codewords, and each codeword is a vector, not a matrix.
    Let us consider the first fully-connected layer in AlexNet, which takes a 9216-D vector (9216=256*6*6) as input, and outputs a 4096-D vector. Here, we let the number of subspace dimensions to be 4, so the number of subspaces, i.e. M, equals to 9216/4=2304. So we split the 9216x4096 weighting matrix, i.e. W, into 2304 sub-matrices, each of size 4x4096. For each sub-matrix, we learn a codebook of size 4xK, which consists of K codewords, each of which is a 4-D vector. Note that K is much smaller than 4096. Then we use this codebook to quantize the sub-matrix, i.e. each column in the sub-matrix is approximated (or replaced) by a codeword selected from this codebook.

  2. The computation cost of inner products is included in the FLOPs.
    Continue with the above example. For a 9216-D input vector, we split it into 2304 sub-vectors, each of 4-D. For each sub-vector, its inner products with all the 4096 column vectors in the corresponding sub-matrix is needed. Since the sub-matrix is quantized with the codebook, we can compute a look-up table of K elements, which are the sub-vector's inner products with all codewords in that codebook. So we can reduce the number of inner product computations from 4096 to K. The look-up table contains all the 4096 inner products we needed to compute the layer response.

Oh, I see! each codeword is a vector, not a matrix
Thanks very much for your explanation!!