FMInference/FlexLLMGen
Running large language models on a single GPU for throughput-oriented scenarios.
PythonApache-2.0
Issues
- 2
- 2
- 0
LP optimization model and constants
#140 opened by dimanzt - 1
"helm_run.py, line 303, in run_entry run_spec = run_specs[0] IndexError: list index out of range"
#129 opened by hjk1231 - 0
- 3
File "setup.py" not found.
#137 opened by Learn2006 - 3
Please do not abandon this project!
#126 opened by oobabooga - 1
- 2
- 0
- 0
Error while split the model name
#133 opened by neomi-tenenbaum-huawei - 0
Why the variable bls must be less than 20?
#132 opened by LHQUer - 9
- 0
Implement RESTful API of FlexGen
#130 opened by Fyphen1223 - 1
what is the helm version?
#128 opened by oujieww - 1
How can I calculate `*mm_flops*` on other GPU which is used in cost_model.py?
#122 opened by minhopark-neubla - 0
【PLS!】I want to know how to generate ray_bootstrap_config.yaml for my own cluster
#123 opened by KylinC - 1
Benchmark for 1 node with 4 GPUs
#106 opened by QiaolingChen00 - 28
Is LLaMa supported?
#60 opened by NightMachinery - 4
how to install from source
#118 opened by SeekPoint - 2
- 0
question about quantization
#119 opened by xinhaoc - 1
ValueError: Invalid model name: galactica-30b
#99 opened by vmajor - 0
Why is the CPU peak memory usage set to 0?
#117 opened by KAIWEILIUCC - 0
flexgen without GPU?
#115 opened by AnatoliChe - 0
NotImplementedError on --percent 50 50 50 50 50 50
#114 opened by SeekPoint - 0
Could flexgen be used for training?
#113 opened by leiwen83 - 4
MultiGPU problem
#105 opened by robinzixuan - 0
Support for MoE models (see Switch Tranformer, NLLB)
#109 opened by fiqas - 0
Peak gpu memory use not scale linearly with the percentage of gpu usage of weight
#108 opened by frankxyy - 0
- 1
Support for LLaMA
#104 opened by ustcwhy - 0
interesting you can crop 65b
#103 opened by seoeaa - 1
Soft lockup after running flex_opt
#86 opened by zhang677 - 1
Is FlexGen+GPTQ 4bit possible?
#101 opened by BarfingLemurs - 0
- 0
Support for ChatGLM
#100 opened by AldarisX - 7
Where is the chatbot? I miss it!
#87 opened by fuhengwu2021 - 0
- 2
is cpu peak_mem monitoring work?
#81 opened by dlfrnaos19 - 0
- 1
Support Galactica
#76 opened by 2003pro - 3
Offloading to disk does not work (opt-66b)
#61 opened by Paethon - 2
opt-175b model how to load model from disc.
#73 opened by prof-schacht - 0
AttributeError
#74 opened by shadowcz007 - 1
Soft Label of Flexgen
#67 opened by 2003pro - 2
Issue with flexgen when running python script
#65 opened by PsoriasiIR - 1
Context Length?
#64 opened by Latrasis - 1
[Apple M1 Max] TypeError: object.__new__() takes exactly one argument (the type to instantiate)
#62 opened by certik - 1