Peak gpu memory use not scale linearly with the percentage of gpu usage of weight

Question

Peak gpu memory use not scale linearly with the percentage of gpu usage of weight

frankxyy opened this issue 2 years ago · 0 comments

command 1:
python -m flexgen.flex_opt --model facebook/opt-30b --path _DUMMY_ --prompt-len 20 --gen-len 15 --percent 25 75 60 40 0 100 --gpu-batch-size 1 --num-gpu-batches 2 --cpu-cache-compute --debug fewer_batch
peak gpu mem: 6.0679 GB

command 2:
python -m flexgen.flex_opt --model facebook/opt-30b --path _DUMMY_ --prompt-len 20 --gen-len 15 --percent 30 70 60 40 0 100 --gpu-batch-size 1 --num-gpu-batches 2 --cpu-cache-compute --debug fewer_batch
gpu oom

The only difference of command 2 from command 1 is the percentage of gpu usage of weight to increase from 25% to 30%.

The capacity of my gpu is 24 GB.