limuhit/ImageCompression

Some question about the paper

Closed this issue · 10 comments

Thanks for your sharing your work. Read your paper I have some questions.
In binarizer part, I thought it can be used in bit allocation. But I can't understand how it works. I don't know about BNN. Thus, I quiet don't know the role of binarizer in bit allocation.
Moreover, the importance map network is an alternative of entropy estimation model as mentioned in paper. The function of the entropy model is to obtain the probability distribution. While importance map is a spatially content weight. I am confused about it.
Thanks in advance. Looking forward to your reply.

Binarizer is adopted to transform the float values to 0s or 1s, which is a kind of quantization operation. The goal of binarized is to quantize the code, which is a necessary step in image compression. It is not directly applied in bit allocation. The bit allocation is controlled by another part, i.e., the importance map.

Thanks for your reply.
Quantization is a lossy process in compression. In your compression method, the binarizer first removes some redundancy in the feature map.
Then the importance map is used to guide bit allocation. The importance map can distinguish the important and unimportance area in the image. Specifically, it works on the probability distribution during entropy estimation, does it?
Moreover, the importance map and the binarized feature maps are element-wise multiplied to obtain the further quantized results.
(If my understanding is incorrect, please point it out.)
plus, I have some questions about the importance. How does it come?
As the importance map network illustrated in paper, it is just stack some CNN layers to obtain it. What does the parameters the network learn if it is a learning process? And how can the network distinguished the importance of each element? The element is pixel-wise or channel-wise?

Actually, we do not estimate the entropy of the codes in the paper. As a workaround, we adopt the number of binary codes allocated by the importance map, i.e., the sum of the importance map, as the upper bound of the rate in the whole framework. Then, the importance map is learned with respect to the joint rate-distortion optimization. In optimization, the texture region and the informative parts usually have larger distortion. To reduce the rate-distortion objective function, it will tend to allocate more bits to the informative parts.

Thank you. Sorry for bothering again. I wonder the importance map net is trained at the same time as the overall network . The structure posted in paper is a end-to-end optimization process, is it?

Yes, it is. Actually, we made a connection between the importance map and the rate loss. Thus, the importance map is trained with respect to the rate-distortion optimization.

Thank you for your timely reply.
I am curios about the importance map module. The feature map output previous layer enter the importance map network. And it output the importance mask.
The stacked CNN (importance map subnet) is used to learn the spatial importance of the original image.
I found the loss function in journal edition, the loss is composed of distortion(MSE or MS-SSIM), Rate and Quantized loss.
1、Which one is correlated to importance subnetwork?
2、The importance map is just used to guide the bit allocation. Or does it have another benefits?
3、What the parameters of this subnetwork need to learn? (What determines the importance of this network)

1 Both the rate and the distortion have the gradients with respect to the importance map. Thus, the importance map is a trade-off between rate and distortion.
2 It is designed to guide the bit allocation. But with the rate-distortion optimization. The importance map tends to focus on the edge and texture regions, thus having better visual quality in low bitrate.
3 We have mentioned in the paper, the importance map is produced by F(x) and "a network of 3 convolutional layers". Thus, the parameters of the 3 convolutional layers need to be learned.

I know the parameters of 3 CNN need to be learned. emmm, I mean, from my prospective, the output of these 3 conv layers be processed by a sigmoid-like operation.
The value in the feature map are transformed to a certain range, eg 0-1 represent the importance. Is it?
And the 3 CNN layers can be deem as a feature extractor (here is a importance determined tool) like backbone network in other situation.
Whether my understanding is correct.

You are right.

Ok, I got it. Thanks.