tensorflow/tensorflow

the quantized form of Shape operation is not yet implemented

raninbowlalala opened this issue · 17 comments

System information

  • Have I written custom code (as opposed to using a stock example script provided in TensorFlow):no
  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04):Linux Ubuntu 16.04
  • Mobile device (e.g. iPhone 8, Pixel 2, Samsung Galaxy) if the issue happens on mobile device:N/A
  • TensorFlow installed from (source or binary):source
  • TensorFlow version (use command below):1.9.0
  • Python version:2.7.3
  • Bazel version (if compiling from source):0.12.0
  • GCC/Compiler version (if compiling from source):c++11
  • CUDA/cuDNN version:7.5.18
  • GPU model and memory:TITAN,12GB
  • Exact command to reproduce:
    ./bazel-bin/tensorflow/contrib/lite/toco/toco --input_file=/deeplabv3_mobinetv2/frozen_inference_graph.pb --output_file=/deeplabv3_mobinetv2/foo.cc --input_format=TENSORFLOW_GRAPHDEF --output_format=TFLITE --inference_type=QUANTIZED_UINT8 --input_shape=1,513,513,3 --input_array=ImageTensor --output_array=logits/semantic/BiasAdd --default_ranges_min=0 --default_ranges_max=6 --mean_value=127.5 --std_value=127.5

Describe the problem

I want to use dummy quantization to quantize deeplabv3_mobilenetv2 model "mobilenetv2_coco_voc_trainaug" from https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md.
But I got the shape operation is not yet implemented.
Do you have plan to implement it?

Source code / logs

2018-07-19 13:49:26.114180: F tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:459] Unimplemented: this graph contains an operator of type Shape for which the quantized form is not yet implemented. Sorry, and patches welcome (that's a relatively fun patch to write, mostly providing the actual quantized arithmetic code for this op).
Aborted (core dumped)

Adding @suharshs to comment on this.

@raninbowlalala We have noted your request and will look in to the quantized implementation of shape op. We will update you on this.

@achowdhery Thanks for your great work!

Shape should now support quantization, but I believe there may be other ops in this model that require additional work (Cast, in particular).

@raninbowlalala would you mind trying again?

@jdduke Shape has supportted quantization, thank you! And I got error log as below:
2018-08-10 09:08:16.577780: F tensorflow/contrib/lite/toco/graph_transformations/quantize.cc:473] Unimplemented: this graph contains an operator of type Cast for which the quantized form is not yet implemented. Sorry, and patches welcome (that's a relatively fun patch to write, mostly providing the actual quantized arithmetic code for this op).

Do you have plan to implement Cast op for the quantized form?

@jdduke Hi, I download the new code to convert deeplabv3_mnv2 model, and I got error message as blow:
Array MobilenetV2/Conv/Relu6, which is an input to the DepthwiseConv operator producing the output array MobilenetV2/expanded_conv/depthwise/Relu6, is lacking min/max data, which is necessary for quantization. If accuracy matters, either target a non-quantized output format, or run quantized training with your model from a floating point checkpoint to change the input graph to contain min/max information. If you don't care about accuracy, you can pass --default_ranges_min= and --default_ranges_max= for easy experimentation.
Dose this mean Relu6 is not suport quantization?

I'll let @suharshs comment further about how to proceed with quantization (both for the Relu6 issue and the Cast issue).

Regarding cast, why does the model have a Cast operation?
Make sure you are converting a eval graph and not a train graph which can sometimes have spurious unsupported operations.

Are you trying to quantize the model using the tool here: https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/quantize

you should train with that which should place quantization ops in the graph to collect info. That second error you get means that the graph is not getting the correct quantization operations in the graph.

How are you making your fake quantized frozen graph to provide to tflite conversion?

@suharshs
For cast op, I want to use dummy quantization for deeplabv3+ (mobilenetv2) model named "mobilenetv2_coco_voc_trainaug" which download from https://github.com/tensorflow/models/blob/master/research/deeplab/g3doc/model_zoo.md. I use tensorboard to see the graph and there are "cast" op.

For training a quantization model. I am sorry I forgot to add "tf.contrib.quantize.create_eval_graph()", convert model successfully after I add this command. Thank you very much!

Yes, after inspecting the graph, it seems there is a Cast a the start of the model. It takes in uint8 and casts to float before ResizeBilinear. This doesn't make sense for a quantized model since all edges in a fully quantized model are uint8 and this cast should really be ignored. So I guess the correct solution is to either somehow remove the Cast from the graph, or the tflite converter should remove it in the case of a quantized model.

screen shot 2018-08-19 at 7 38 57 pm

Will think more about the right way to address this, thanks!

@suharshs Hi, could you add "sub" and "mul" op with quantization supported? I found after I add command "tf.contrib.quantize.create_eval_graph()" there are not fake node with "sub" and "mul".

@raninbowlalala Agree with you ! I also found some basic ops are not supported with quantization, such as "Add" , "Mul", "Sub", "Mean" etc.

The issue is that the contrib/quantize rewriter is not very robust to any arbitrary model yet.

It can be complicated and sometimes not possible to fully quantize certain models due to the available fused operations at any given time , if you goal is to just get a smaller and faster model, I recommend trying the --post_training_quantize flag to tflite_convert. With that you keep the inference_type=FLOAT and pass a floating point version of your model (no need to call the contrib/quantize tool). That may provide sufficient speedup for your use case. Check it out here: https://www.tensorflow.org/performance/post_training_quantization

We do plan on making better tooling for fully quantized models as well, but as a first pass --post_training_quantize will get you the furthest, and if speed/accuracy aren't sufficient, we should add FakeQuant nodes into the graph with the contrib/quantize tool and sometimes manually for patterns that aren't recognized.

@raninbowlalala Agree with you ! I also found some basic ops are not supported with quantization, such as "Add" , "Mul", "Sub", "Mean" etc.

square, sqrt and squared_difference are also not supported.

@raninbowlalala,
Sorry for the delayed response. As per the documentation of Post-training quantization,

You can quantize an already-trained float TensorFlow model when you convert it to TensorFlow Lite format using the TensorFlow Lite Converter.

Many operations like Addition, Multiply, Divide, Square Root, etc.. are supported now, as part of Quantization. Please refer this documentation for the list of supported OPs.

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you.

Closing as stale. Please reopen if you'd like to work on this further.