FMInference/FlexLLMGen

Running large language models on a single GPU for throughput-oriented scenarios.

PythonApache-2.0

Issues

Page Cache Management
#142 opened 2 months ago by BL-GS
0
How to use the model that has already been downloaded？
#127 opened a year ago by AntonioZC666
2
AttributeError: 'OptLM' object has no attribute 'weight_home'
#116 opened 2 years ago by pxc3113
2
LP optimization model and constants
#140 opened 3 months ago by dimanzt
0
"helm_run.py, line 303, in run_entry run_spec = run_specs[0] IndexError: list index out of range"
#129 opened a year ago by hjk1231
1
How to split execution of prefill and decode for Flexgen?
#138 opened 6 months ago by sunchaesk
0
File "setup.py" not found.
#137 opened 8 months ago by Learn2006
3
Please do not abandon this project!
#126 opened a year ago by oobabooga
3
How do I match the results of profiling with the parameters of the cost model?
#131 opened a year ago by xvanQ
1
Killed Issue with flexgen when running python script
#136 opened 9 months ago by foreverpiano
2
AttributeError: 'NoneType' object has no attribute 'stream_id'
#134 opened 10 months ago by neomi-tenenbaum-huawei
0
Error while split the model name
#133 opened 10 months ago by neomi-tenenbaum-huawei
0
Why the variable bls must be less than 20？
#132 opened 10 months ago by LHQUer
0
AttributeError: 'OptLM' object has no attribute 'weight_home'
#75 opened 2 years ago by shadowcz007
9
Implement RESTful API of FlexGen
#130 opened a year ago by Fyphen1223
0
what is the helm version?
#128 opened a year ago by oujieww
1
How can I calculate `*mm_flops*` on other GPU which is used in cost_model.py?
#122 opened a year ago by minhopark-neubla
1
【PLS!】I want to know how to generate ray_bootstrap_config.yaml for my own cluster
#123 opened a year ago by KylinC
0
Benchmark for 1 node with 4 GPUs
#106 opened 2 years ago by QiaolingChen00
1
how to install from source
#118 opened 2 years ago by SeekPoint
4
【bug】? if we forget to add time mark code line in hf_ds folder
#120 opened 2 years ago by oujieww
2
question about quantization
#119 opened 2 years ago by xinhaoc
0
ValueError: Invalid model name: galactica-30b
#99 opened 2 years ago by vmajor
1
Why is the CPU peak memory usage set to 0?
#117 opened 2 years ago by KAIWEILIUCC
0
flexgen without GPU?
#115 opened 2 years ago by AnatoliChe
0
NotImplementedError on --percent 50 50 50 50 50 50
#114 opened 2 years ago by SeekPoint
0
Could flexgen be used for training?
#113 opened 2 years ago by leiwen83
0
MultiGPU problem
#105 opened 2 years ago by robinzixuan
4
Support for MoE models (see Switch Tranformer, NLLB)
#109 opened 2 years ago by fiqas
0
Peak gpu memory use not scale linearly with the percentage of gpu usage of weight
#108 opened 2 years ago by frankxyy
0
When will the optimizer for determining offload strategy be released?
#107 opened 2 years ago by frankxyy
0
Support for LLaMA
#104 opened 2 years ago by ustcwhy
1
interesting you can crop 65b
#103 opened 2 years ago by seoeaa
0
Soft lockup after running flex_opt
#86 opened 2 years ago by zhang677
1
Is FlexGen+GPTQ 4bit possible?
#101 opened 2 years ago by BarfingLemurs
1
Question about allocations among different memory hierarchies
#97 opened 2 years ago by aakejiang
0
Support for ChatGLM
#100 opened 2 years ago by AldarisX
0
Where is the chatbot? I miss it!
#87 opened 2 years ago by fuhengwu2021
7
Question about the num-gpu-batches and gpu-batch-size
#98 opened 2 years ago by young-chao
0
is cpu peak_mem monitoring work?
#81 opened 2 years ago by dlfrnaos19
2
Questions about the intermediate tensor buffers design
#92 opened 2 years ago by Dazz993
0
Support Galactica
#76 opened 2 years ago by 2003pro
1
Offloading to disk does not work (opt-66b)
#61 opened 2 years ago by Paethon
3
opt-175b model how to load model from disc.
#73 opened 2 years ago by prof-schacht
2
AttributeError
#74 opened 2 years ago by shadowcz007
0
Soft Label of Flexgen
#67 opened 2 years ago by 2003pro
1
Issue with flexgen when running python script
#65 opened 2 years ago by PsoriasiIR
2
Context Length?
#64 opened 2 years ago by Latrasis
1
[Apple M1 Max] TypeError: object.__new__() takes exactly one argument (the type to instantiate)
#62 opened 2 years ago by certik
1
May i ask why the model is hard coded as "facebook/opt-30b"?
#63 opened 2 years ago by AISuperMa
1