[BUG] Transformers >= 4.39.0 broke llama quantization

Question

[BUG] Transformers >= 4.39.0 broke llama quantization

Closed this issue 3 months ago · 2 comments

Latest transformers >= 4.39.0 changed llama model layer code breaking quantization. Suspect other model types have the same issue. Test passes for 4.38.2 and fails for both 4.39.0 and 4.39.1.

__________________________________________________________________________________________________________________________ TestQuantization.test_quantize_1 __________________________________________________________________________________________________________________________

a = (<tests.test_quantization.TestQuantization testMethod=test_quantize_1>,), kw = {}

    @wraps(func)
    def standalone_func(*a, **kw):
>       return func(*(a + p.args), **p.kwargs, **kw)

/root/miniconda3/lib/python3.11/site-packages/parameterized/parameterized.py:620: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
test_quantization.py:37: in test_quantize
    model.quantize(examples)
/root/miniconda3/lib/python3.11/site-packages/torch/utils/_contextlib.py:115: in decorate_context
    return func(*args, **kwargs)
../auto_gptq/modeling/_base.py:408: in quantize
    self.model(**example)
/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1511: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1520: in _call_impl
    return forward_call(*args, **kwargs)
/root/miniconda3/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:1196: in forward
    outputs = self.model(
/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1511: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1520: in _call_impl
    return forward_call(*args, **kwargs)
/root/miniconda3/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:990: in forward
    causal_mask = self._update_causal_mask(attention_mask, inputs_embeds, cache_position)
/root/miniconda3/lib/python3.11/site-packages/transformers/models/llama/modeling_llama.py:1067: in _update_causal_mask
    if hasattr(self.layers[0].self_attn, "past_key_value"):  # static cache
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = LayerHijacker(
  (module): LlamaDecoderLayer(
    (self_attn): LlamaSdpaAttention(
      (q_proj): Linear(in_features=...      (act_fn): SiLU()
    )
    (input_layernorm): LlamaRMSNorm()
    (post_attention_layernorm): LlamaRMSNorm()
  )
), name = 'self_attn'

    def __getattr__(self, name: str) -> Any:
        if '_parameters' in self.__dict__:
            _parameters = self.__dict__['_parameters']
            if name in _parameters:
                return _parameters[name]
        if '_buffers' in self.__dict__:
            _buffers = self.__dict__['_buffers']
            if name in _buffers:
                return _buffers[name]
        if '_modules' in self.__dict__:
            modules = self.__dict__['_modules']
            if name in modules:
                return modules[name]
>       raise AttributeError(f"'{type(self).__name__}' object has no attribute '{name}'")
E       AttributeError: 'LayerHijacker' object has no attribute 'self_attn'

/root/miniconda3/lib/python3.11/site-packages/torch/nn/modules/module.py:1688: AttributeError
-------------------------------------------------------------------------------------------------------------------------------- Captured stdout call --------------------------------------------------------------------------------------------------------------------------------
mode quant_config: BaseQuantizeConfig(bits=4, group_size=128, damp_percent=0.01, desc_act=False, static_groups=False, sym=True, true_sequential=True, method='gptq', format='marlin', model_name_or_path=None, model_file_base_name=None))
============================================================================================================================== short test summary info ===============================================================================================================================
FAILED test_quantization.py::TestQuantization::test_quantize_0 - AttributeError: 'LayerHijacker' object has no attribute 'self_attn'
FAILED test_quantization.py::TestQuantization::test_quantize_1 - AttributeError: 'LayerHijacker' object has no attribute 'self_attn'

Answer 1 · 2024-03-25T12:58:52.000Z

@Qubitium thanks a lot will look into it

Answer 2 · 2024-03-25T14:29:41.000Z

Fixed in #607