qwopqwop200/GPTQ-for-LLaMa

AttributeError: 'QuantLinear' object has no attribute 'weight' (t5 branch) (Google/flan-ul2)

sigmareaver opened this issue · 2 comments

i7-13700k
128GB RAM
RTX 4090

Python = 3.9.10
Transformers = 4.30.0.dev0
PyTorch = 2.0.1
Model = Google/flan-ul2

Quantization command:

PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:512 python t5.py ../full-models/flan-ul2 wikitext2 --nsamples 256 --wbits 4 --act-order --groupsize 128 --save ../gptq-models/flan-ul2-gptq/flan-ul2-4bit-128g-gptq.pt

I also needed to edit t5_sequential() to run on 24GB of VRAM, but I don't think this should affect the model? The following snippet shows the extent of my changes, except for import gc at the top of the file.

        del layer
        del gptq 
        gc.collect()
        torch.cuda.empty_cache()

        inps, outs = outs, inps
        
    # do this part on CPU, because GPU runs out of memory
    dev = 'cpu'

    model.encoder.final_layer_norm = model.encoder.final_layer_norm.to(dev)
    model.encoder.dropout = model.encoder.dropout.to(dev)
    
    encoder_hidden_states = model.encoder.final_layer_norm(inps.cpu())
    encoder_hidden_states = model.encoder.dropout(encoder_hidden_states)
    
    model.encoder.final_layer_norm = model.encoder.final_layer_norm.cpu()
    model.encoder.dropout = model.encoder.dropout.cpu()

    dev = 'cuda:0'
    encoder_hidden_states = encoder_hidden_states.to(dev)
    inps = inps.to(dev)
    # end of CPU section

Otherwise my 4090 runs out of memory when trying to load model.encoder.final_layer_norm = model.encoder.final_layer_norm.to(dev) to the GPU.

Benchmark command (also applies to t5_inference.py):

python t5.py ../full-models/flan-ul2 wikitext2 --load ../gptq-models/flan-ul2-gptq/flan-ul2-4bit-128g-gptq.pt --wbits 4 --groupsize 128 --benchmark --benchmark_mode mmlu

Yields the following error:

Traceback (most recent call last):
  File "/mnt/Storage/ai-dev/t5-gptq/t5.py", line 752, in <module>
    mmlu_benchmark(model, tokenizer, args)
  File "/mnt/Storage/ai-dev/t5-gptq/t5.py", line 542, in mmlu_benchmark
    cors, acc, probs = mmlu_eval(args, subject, model, tokenizer, dev_df, test_df, (idx,len(subjects)))
  File "~/anaconda3/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/mnt/Storage/ai-dev/t5-gptq/t5.py", line 473, in mmlu_eval
    logits = model(
  File "~/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "~/anaconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 1683, in forward
    encoder_outputs = self.encoder(
  File "~/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "~/anaconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 1090, in forward
    layer_outputs = layer_module(
  File "~/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "~/anaconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 753, in forward
    hidden_states = self.layer[-1](hidden_states)
  File "~/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "~/anaconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 342, in forward
    forwarded_states = self.DenseReluDense(forwarded_states)
  File "~/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "~/anaconda3/lib/python3.9/site-packages/transformers/models/t5/modeling_t5.py", line 319, in forward
    isinstance(self.wo.weight, torch.Tensor)
  File "~/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1614, in __getattr__
    raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'QuantLinear' object has no attribute 'weight'

Edit: added snippet showing code modifications, and edited quantization command to show PYTORCH_CUDA_ALLOC_CONF environment variable.

Not sure what I did differently, but it started suggesting qweight now...
AttributeError: 'QuantLinear' object has no attribute 'weight'. Did you mean: 'qweight'?

My apologies. It seems a requirement was somehow not installed, or overwritten with your transformers-t5 repo.