tensorflow/model-optimization

float16 quantization runs out of memory for LSTM model

Black3rror opened this issue · 1 comments

No matter the size of the LSTM model, converting it with float16 optimization runs out of memory.

Code to reproduce the issue
The code snippet to reproduce the issue on Google Colab
Code:

import numpy as np
import tensorflow as tf
import tensorflow_model_optimization as tfmot

def create_model():
  model = tf.keras.models.Sequential()

  # For the model to later get converted, batch_size and sequence_length should be fixed.
  # E.g., batch_input_shape=[None, 1] will throw an error.
  # This is just a limitation when using RNNs. E.g., for FC or CNN we can have batch_size=None
  model.add(tf.keras.layers.Embedding(
    input_dim=5,
    output_dim=1,
    batch_input_shape=[1, 1]
  ))

  model.add(tf.keras.layers.LSTM(
    units=1,
    return_sequences=False,
    stateful=False
  ))

  model.add(tf.keras.layers.Dense(5))

  return model

model = create_model()
model.summary()

model.save("/content/model/")

representative_data = np.random.randint(0, 5, (200, 1)).astype(np.float32)

def representative_dataset():
  for sample in representative_data:
    sample = np.expand_dims(sample, axis=0)     # batch_size = 1
    yield [sample]                              # set sample as first (and only) input of the model

# float16 quantization
converter = tf.lite.TFLiteConverter.from_saved_model("/content/model/")
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_types = [tf.float16]
# kernel runs out of memory and crashes in the following line
tflite_quant_model = converter.convert()

Closing the duplicated one.