float16 quantization runs out of memory for LSTM model
Black3rror opened this issue · 1 comments
Black3rror commented
No matter the size of the LSTM model, converting it with float16 optimization runs out of memory.
Code to reproduce the issue
The code snippet to reproduce the issue on Google Colab
Code:
import numpy as np
import tensorflow as tf
import tensorflow_model_optimization as tfmot
def create_model():
model = tf.keras.models.Sequential()
# For the model to later get converted, batch_size and sequence_length should be fixed.
# E.g., batch_input_shape=[None, 1] will throw an error.
# This is just a limitation when using RNNs. E.g., for FC or CNN we can have batch_size=None
model.add(tf.keras.layers.Embedding(
input_dim=5,
output_dim=1,
batch_input_shape=[1, 1]
))
model.add(tf.keras.layers.LSTM(
units=1,
return_sequences=False,
stateful=False
))
model.add(tf.keras.layers.Dense(5))
return model
model = create_model()
model.summary()
model.save("/content/model/")
representative_data = np.random.randint(0, 5, (200, 1)).astype(np.float32)
def representative_dataset():
for sample in representative_data:
sample = np.expand_dims(sample, axis=0) # batch_size = 1
yield [sample] # set sample as first (and only) input of the model
# float16 quantization
converter = tf.lite.TFLiteConverter.from_saved_model("/content/model/")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
# kernel runs out of memory and crashes in the following line
tflite_quant_model = converter.convert()
cdh4696 commented
Closing the duplicated one.