keras-team/keras-nlp

Gemma Model Storing and Loading after Fine tuning

Opened this issue · 4 comments

Hi there, I encountered a strange bug after trying to load the gemma-2b model using kerasnlp.

My finetuning code is the following:

` def fine_tune(self, X, y):
data = generate_training_prompts(X, y)
# enable lora-finetuning
self.model.backbone.enable_lora(rank=self.config['lora_rank'])

    # Reduce the input sequence length to limit memory usage
    self.model.preprocessor.sequence_length = self.config['tokenization_max_length']

    # Use AdamW (a common optimizer for transformer models)
    optimizer = keras.optimizers.AdamW(
        learning_rate=self.config['learning_rate'],
        weight_decay=self.config['weight_decay'],
    )

    # Exclude layernorm and bias terms from decay
    optimizer.exclude_from_weight_decay(var_names=["bias", "scale"])

    self.model.compile(
        loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
        optimizer=optimizer,
        weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()],
        sampler=self.config['sampler'],
    )

    self.model.fit(data, epochs=self.config['epochs'], batch_size=self.config['batch_size'])

    # Define the directory name
    fine_tuned_dir_name = f'fine_tuned_{self.config["basemodel"]}_{datetime.now().strftime("%Y%m%d_%H%M%S")}'
    fine_tuned_dir_path = os.path.join('models', fine_tuned_dir_name)

    # Create the directory if it doesn't exist
    if not os.path.exists(fine_tuned_dir_path):
        os.makedirs(fine_tuned_dir_path)

    # Save only the weights in the directory with a specific name
    weights_file_path = os.path.join(fine_tuned_dir_path, 'weights.keras')
    self.model.save(weights_file_path)

    # Save model configuration within the same directory
    model_config = create_model_config(self.config, np.unique(
        y).tolist())  # Ensure you have `class_names` defined or adapt as necessary
    config_filename = os.path.join(fine_tuned_dir_path, 'model_config.json')
    with open(config_filename, 'w') as json_file:
        json.dump(model_config, json_file, indent=4)

    # Push model weights and config to wandb
    # Note: You may need to adjust this depending on how wandb expects files to be saved
    wandb.save(os.path.join(fine_tuned_dir_path, '*'))`

The training completes as expected in keras. Although when I try to load the model using the weights.keras file created from the script above I am getting two unexpected behaviors, see script for loading the model below,

`import keras

loaded_model = keras.saving.load_model("/data/host-category-classification/nlp/classification/Gemma/models"
"/fine_tuned_gemma-2b_20240229_151158/weights.keras")

print(loaded_model.summary())`

First, I observed that each call to the loading process will generate unknown set of files that occupy my disk indefinitely ~10 gb. In addition, the loading process takes forever (havent found the actual time but it should not take more than 10 minutes to load) compared to the the gemma.load_preset method. Do you have any suggestions? There seem to be null documentation either on keras nlp or tensorflow regarding model storage and loading for gemma related models.

In addition when loading I get this output :

UserWarning: compile() was not called as part of model loading because the model's compile() method is custom. All subclassed Models that have compile() overridden should also override get_compile_config() and compile_from_config(config). Alternatively, you can call compile() manually after loading.
instance.compile_from_config(compile_config)

Hi @kreouzisv!

Would it be possible for you to provide a colab that reproduces this issue?

the loading process takes forever [...] compared to the the gemma.load_preset method

I have observed this as well. I'm at 53 minutes of CPU time on a very high end mac laptop and still waiting for the load to complete. [EDIT: completed at the 56 minute mark.]

The reproducer is no more complicated than using keras.callbacks.ModelCheckpoint followed by keras.saving.load_model.

I suspect--without proof--that unzipping the .keras file is a meaningful part of this. For unrelated reasons, I unzipped a .keras file and found it was excruciatingly slow. (Pity that moving to zstd would be a breaking change.)

I also suspect--without proof--that the optimizer state is getting saved and restored, which will significantly increase the disk and load times vs a from_preset with no optimizer. I don't see an obvious load_model API knob in the docs to disable restoring the optimizer state to try out.

I suspect--without proof--that unzipping the .keras file is a meaningful part of this.

OK, ran a quick experiment, and I have some proof now. :)

I took a model I had saved as .keras, the same one referenced above.

I did:

unzip model.keras
mkdir contents
mv assets *.json *.h5 contents
cd contents
zip -0 -r model_store *
mv model_store.zip ../model_store.keras

This increased the file size from ~7.4gb to ~12.8gb.

But it reduced the time required to open the model from ~56 minutes to ~5 minutes.

I also suspect--without proof--that the optimizer state...

It appears I was wrong about this.