FMInference/FlexLLMGen
Running large language models on a single GPU for throughput-oriented scenarios.
PythonApache-2.0
Issues
- 0
LP optimization model and constants
#140 opened - 0
- 3
File "setup.py" not found.
#137 opened - 2
- 0
- 0
Error while split the model name
#133 opened - 0
Why the variable bls must be less than 20?
#132 opened - 1
- 0
Implement RESTful API of FlexGen
#130 opened - 1
"helm_run.py, line 303, in run_entry run_spec = run_specs[0] IndexError: list index out of range"
#129 opened - 1
what is the helm version?
#128 opened - 2
- 3
Please do not abandon this project!
#126 opened - 0
- 1
- 2
- 0
question about quantization
#119 opened - 4
how to install from source
#118 opened - 0
Why is the CPU peak memory usage set to 0?
#117 opened - 2
- 0
flexgen without GPU?
#115 opened - 0
- 0
Could flexgen be used for training?
#113 opened - 0
- 0
- 0
- 1
Benchmark for 1 node with 4 GPUs
#106 opened - 4
MultiGPU problem
#105 opened - 1
Support for LLaMA
#104 opened - 0
interesting you can crop 65b
#103 opened - 1
Is FlexGen+GPTQ 4bit possible?
#101 opened - 0
Support for ChatGLM
#100 opened - 1
- 0
- 0
- 0
- 7
Where is the chatbot? I miss it!
#87 opened - 1
Soft lockup after running flex_opt
#86 opened - 2
is cpu peak_mem monitoring work?
#81 opened - 1
Support Galactica
#76 opened - 9
- 0
AttributeError
#74 opened - 2
- 1
Soft Label of Flexgen
#67 opened - 2
- 1
Context Length?
#64 opened - 1
- 1
[Apple M1 Max] TypeError: object.__new__() takes exactly one argument (the type to instantiate)
#62 opened - 3
- 28
Is LLaMa supported?
#60 opened