limuhit/ImageCompression

Gradient of the mask with respect to importance map

Closed this issue · 23 comments

Hello,
I'm trying to implement equation (7) in the paper but I'm having some troubles.

  1. What is the shape of the gradient?
  2. Can you post a pseudo-code of how that equation should be implemented?

Thanks!

The gradient of the important mask is the shape of n\times 64 \times (h/8) \times (w/8), n is the batch size, h and w are the height and width of the input image. The gradient of the important map is the shape of n \times 1 \times (h/8) \times (w/8). The equation (7) is used to map the gradient of the important mask to the important map. Let's set the importance map value is x, [x \times 16] is the quantized importance value, [.] means float2int. The gradient of x is just to add the gradient of the important mask is the channel from [x \times 16] \times 4- 4 to [x \times 16]\times 4+4, a window of the size 8.

Thanks for the reply.
You wrote that "The gradient of the important mask is the shape of n\times 64 \times (h/8) \times (w/8)" and then "The gradient of the important map is the shape of n \times 1 \times (h/8) \times (w/8)". Can you please clarify?
As for my understanding (please correct me if I'm wrong):

  • The quantized importance map has shape [batch_size, 1, h/8, w/8] and its values can be one out of L=16 quantized values.
  • The mask has shape [batch_size, n=64, h/8, w/8] and can have either 0 or 1 values.
  • So, I assume that the gradient wrt the importance map has same size as the importance map, that is [batch_size, 1, h/8, w/8].

In your final example, " The gradient of x is just to add the gradient of the important mask is the channel from [x \times 16] \times 4- 4 to [x \times 16]\times 4+4, a window of the size 8", I didn't understand what the final value is for the gradient wrt x (i.e., p_ij in the equation). In the equation, you use L as the final value, so does it mean that you assign 16 as the gradient wrt p_ij?

Thanks again!

To begin with, 16 is not the gradient. It is a constant which should always reduce importance map p_ij to 0.

Set the important mask w.r.t. p_ij is a vector [m_0ij, ... ,m_63ij]. The gradient of the important mask is [f_0ij, ..., m_63ij]. q_ij=Q(p_ij \times 16) is the quantized importance value. Then, the gradient of p_ij is g(q_ij)=f_(q_ij-4)ij+...+f_(q_ij+3)ij. According to equation 7, you just need to multiply a constant L for the gradient, that is g(q_ij) \times L.

OK thanks! So, the gradient wrt p_ij is:
g(q_ij) = L * ( f(q_ij-4) + ... + f(q_ij+3) )
Correct?

And, for the case in which L=n (i.e., importance map is quantized to same number of levels as the number of channels of encoder's output), the gradient wrt p_ij is:
g(q_ij) = n * ( f(q_ij-1) + f(q_ij) + f(q_ij+1) )
Correct?

Yes

I'm still pretty confused by the above description.

(1) Where did the function f(q_ij) come from? What does it represent?
(2) The claim is that the derivative of m_kij with respect to p_ij is:
g(q_ij) = n * ( f(q_ij-1) + f(q_ij) + f(q_ij+1) )

What does this line mean? is q_ij-1 the (i,j)-th element of q, subtracted by 1? Let's say q_ij = 7. Is the (i,j)-th element of the gradient ∂m/∂p = n * ( f(6) + f(7) + f(8) )?

I'm still pretty confused by the above description.

(1) Where did the function f(q_ij) come from? What does it represent?
(2) The claim is that the derivative of m_kij with respect to p_ij is:
g(q_ij) = n * ( f(q_ij-1) + f(q_ij) + f(q_ij+1) )

What does this line mean? is q_ij-1 the (i,j)-th element of q, subtracted by 1? Let's say q_ij = 7. Is the (i,j)-th element of the gradient ∂m/∂p = n * ( f(6) + f(7) + f(8) )?

As I mentioned above q_ij is the quantization value of the value p_ij in the importance map. It is an integer and an index. We make use of it to set the importance mask m at the position (i,j) by setting the channels of m_{k,i,j} =0 if k>=q_ij. f(k) represents the gradient with respect to m_{k,i,j}.

Thank you for the above reply. To clarify that I understand you, can you verify an example for me? Let's say we have an importance map, p, with shape [14,14,1], a quantized map q(p) with shape [14,14,1], and an importance map function M(q(p)) with shape [14,14,8].

Let some arbitrary p_ij = 0.50
Then, q(p)_ij = 0.50 * L = 4
and M(q(p))_ij = [1, 1, 1, 1, 0, 0, 0, 0]
and the gradients of M with respect to q would be:
M'(q(p))_ij = [0, 0, grad * L, grad *L, 0, 0, 0, 0]

Thank you for the above reply. To clarify that I understand you, can you verify an example for me? Let's say we have an importance map, p, with shape [14,14,1], a quantized map q(p) with shape [14,14,1], and an importance map function M(q(p)) with shape [14,14,8].

Let some arbitrary p_ij = 0.50
Then, q(p)_ij = 0.50 * L = 4
and M(q(p))_ij = [1, 1, 1, 1, 0, 0, 0, 0]
and the gradients of M with respect to q would be:
M'(q(p))_ij = [0, 0, grad * L, grad *L, 0, 0, 0, 0]

You are right.

How to apply quantization as it is non-differentiable and making the error of gradient 0?

As indicated in the paper, we employ quantization operation in forward-propagation and the identity function f(x)=x in backward-propagation.

I am not sure which quantization you are questioning. The one for importance map or the one for codes. By the way, could you report the errors in detail? Maybe I could help to finger out the problem.

I see the problem. I guess you are using a framework with auto-gradient. If so, you should design your own quantization operation and write your own backward function. In the backward function, just copy the gradient from the next layer and backward it to the previous layer.

It seems Keras is built on TensorFlow. Please check https://github.com/tensorflow/compression for more reference. I am not quite familiar with Keras. Hope this git project would be helpful.

Respected Sir, can you please make this link active again Earlier it was active, but now it is not working http://www2.comp.polyu.edu.hk/~15903062r/content-weighted-image-compression.html I will be gratelful to you

On Tue, Jan 7, 2020 at 6:37 PM Mu Li @.***> wrote: As indicated in the paper, we employ quantization operation in forward-propagation and the identity function f(x)=x in backward-propagation. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#2?email_source=notifications&email_token=AH6YVG7KPJU66P6MKLCVCK3Q4R5CTA5CNFSM4F7YFHXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEIIZQYY#issuecomment-571578467>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AH6YVGYZJB7UPODIZRKENGLQ4R5CTANCNFSM4F7YFHXA .

Sorry that the link is inactive. As far as I know, our department server is now under upgrading. Considering these days are holidays in Hong Kong, it would need a few days before the server resume.

Actually, I have no idea about the detail of the upgrading schedule. It is hard to say how much time it will take.