Support merge_and_unload for IA3 Adapters with 4-bit and 8bit Quantization models

Question

Support merge_and_unload for IA3 Adapters with 4-bit and 8bit Quantization models

Abdullah-kwl opened this issue a month ago · 1 comments

Feature request

Enable merge_and_unload functionality with ia3 adapters loaded with 4-bit and 8-bit quantization model. Currently, merging fails with an error "Cannot merge ia3 layers when the model is loaded in 4-bit mode"

Motivation

Existing merge_and_unload support excludes 4-bit quantized models with ia3 adapters. Merging ia3 adapters into the base model during 4-bit quantization leverages the size reduction of quantization and simplifies deployment by creating a single, smaller model.This feature aligns with the core advantages of IA3(reduced model size) and 4-bit quantization (efficiency gains), enabling users to fully exploit these optimizations.

Your contribution

While I cannot currently submit a pull request, I'm happy to provide further details, test functionalities after implementation, and assist with documentation updates if needed.

Answer 1 · 2024-06-01T15:03:30.000Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.