CUDA out of memory on flan-ul2
sigmareaver opened this issue · 1 comments
sigmareaver commented
Tested on 4090
Using command:
python t5.py ../full-models/flan-ul2 c4 --wbits 4 --act-order --groupsize 128 --save ../gptq-models/flan-ul2-gptq/flan-ul2-4bit-128g-gptq.pt
What is the memory requirement for quantizing a 20b model? I thought it should only need one layer at a time on GPU?
sigmareaver commented
Was able to quantize using --nsamples 256
and hacking a part of the code in t5_sequential
, the part about final layer norms and dropout, to be run on CPU.