FMInference/FlexLLMGen

NotImplementedError on --percent 50 50 50 50 50 50

Opened this issue · 0 comments

(base) ub2004@ub2004-B85M-A0:~/nndev/FlexGen_yk$ python3 -m flexgen.flex_opt --model facebook/opt-1.3b --gpu-batch-size 1 --percent 50 50 50 50 50 50
<run_flexgen>: args.model: facebook/opt-1.3b
get_opt_config is: <function get_opt_config at 0x7f5009c2e320>
model size: 2.443 GB, cache size: 0.100 GB, hidden size (prefill): 0.002 GB
init weight...
Traceback (most recent call last):
File "/home/ub2004/anaconda3/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/ub2004/anaconda3/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/ub2004/nndev/FlexGen_yk/flexgen/flex_opt.py", line 1328, in
run_flexgen(args)
File "/home/ub2004/nndev/FlexGen_yk/flexgen/flex_opt.py", line 1220, in run_flexgen
model = OptLM(opt_config, env, args.path, policy)
File "/home/ub2004/nndev/FlexGen_yk/flexgen/flex_opt.py", line 615, in init
raise NotImplementedError()
NotImplementedError