huggingface/tflite-android-transformers

How to make FP16 quantization on gpt/xl?

Archelunch opened this issue · 4 comments

How could I fix this error?
ValueError: Message tensorflow.GraphDef exceeds maximum protobuf size of 2GB: 6234365906

Hi @Archelunch, what is the script you use? At which step are you getting this error?

output_graph = "frozen_graph.pb"`
output_graph_def = tf.graph_util.convert_variables_to_constants(
                        sess,
                        tf.get_default_graph().as_graph_def(),
                        ["sample_sequence_2/while/Exit_3"]
                    ) 

with tf.gfile.GFile(output_graph, "wb") as f:
    f.write(output_graph_def.SerializeToString())
ValueError                                Traceback (most recent call last)
<ipython-input-69-6930509752c3> in <module>
      8     # serialize and dump the output graph to the filesystem
      9 with tf.gfile.GFile(output_graph, "wb") as f:
---> 10     f.write(output_graph_def.SerializeToString())

ValueError: Message tensorflow.GraphDef exceeds maximum protobuf size of 2GB: 6233583551```

Are you working with a big model? It seems you're reaching the limit size (2GB) supported by protobuf which is used internally by TensorFlow to serialize the graph. It doesn't have any relation with quantization, it's a more general TF limitation it seems.

Is there any way to loop through the .pb file in a similar way to this stackoverflow question/answer so as to cut down on the amount of information that is loaded in memory?