TropComplique/trained-ternary-quantization

Possible to use pretrained models in quantization?

pgadosey opened this issue · 9 comments

Hi,
Thanks for this. Is it possible to just use my own pretrained models for quantization?

Hi!
This is an implementation of this paper:
https://arxiv.org/abs/1612.01064.
Please read it to understand the code.

For now results are:
vanilla densenet: 48% accuracy,
quantized densenet: 40% accuracy.

Yes it is possible to use your own pretrained models.

  1. You need to edit densenet_ternary_quantization/get_densenet.py
    so it returns your pretrained model.
  2. You need to edit utils/data_utils.py so it uses your dataset.
  3. Then you need to use densenet_ternary_quantization/train.ipynb.

The main thing that you need to understand is densenet_ternary_quantization/utils.py.
It contains the main algorithm.

Please note that quantization may not work for your model.
For example, I tried also to quantize SqueezeNet but it didn't work.

Please let me know if you have any questions.

Thank you for the details. I will try with my model and see if it works

Hi , I made a few modifications and tried this using a custom model. Just wanted to ask, is this a normal out put especially my 2nd and 3rd outputs in all_losses?
0 225116437073841664.000 10782760509240.195 0.508 0.985 0.508 0.985 717.466
1 7949443220592.859 1021668280270.049 0.505 0.998 0.505 0.998 707.510
2 893205531730.435 305606759099.317 0.506 0.998 0.506 0.998 716.858
3 246050468705.375 358482354775.415 0.504 0.999 0.504 0.999 717.034
4 934430193350.184 66677406944.780 0.505 0.998 0.505 0.998 712.632
CPU times: user 1h 1min 10s, sys: 28min 18s, total: 1h 29min 29s
Wall time: 59min 31s

And also the size of my output model remained the same

  1. I believe you used it on a very easy dataset because val_accuracy is quite high.
  2. It is not normal that 2nd and 3rd outputs are so high. Sometimes I get the same high losses.
    It depends on hyperparameters you used. To solve this problem you need to tweak t and learning rates.
  3. The size in MB of a quantized model will remain the same because weights are still stored as float32 values. But it will be much smaller if you store weights as 2-bit values and scaling factors.

Please understand that TTQ is an experimental method. It is not ready for production. It is a pure research. You need to try a lot of hyperparameters before it works.

Later, I will refactor the code so it will be easier to understand.

Yes, I actually took a pretrained vgg16 model from vision and performed transfer learning for the kaggle dogs_cats dataset so quite an easy dataset and got an accuracy of 0.97 so trying out the same finetuned model with the aim of reducing it's size using TTQ sort of improved the accuracy I guess.
Thank you for the suggestions .

@pgadosey Hi, do you find the method to save the quantized model's params? I try to do some quantization , but the model's size unchanged, could you help me? I am really appreciate your kind.

hi @RichardMrLu I got preoccupied with another project so I didn't get my self to try but I will let you know if I am able to. Please let me know as well if you figure it out. Thanks

In traditional quantization, like linear quantization, it works on two stage, first map input to integer index, then map integer index to the output(approximation of input).
Maybe save the paramers on the first stage will reduece the size of model.

Assume that you have the following weights after TTQ:

quantized_weights = [[0.0 -0.3  0.0 ]
                     [0.4 0.0   0.0 ]
                     [0.4 -0.3  0.4]]

You can store them like this:
string 020100121 and two numbers [0.4, -0.3].