vectorch-ai/ScaleLLM
A high-performance inference system for large language models, designed for production environments.
C++Apache-2.0
Pinned issues
Issues
- 7
- 3
Mistral large GPTQ model inference problem
#308 opened by drdaliang - 1
RuntimeError: Timed out
#310 opened by spongxin - 3
The process terminated before reaching the specified max_tokens after setting ignore_ros=True and max_tokens.
#304 opened by HowardChenRV - 2
- 4
- 2
- 0
pytest core dump in workflow
#258 opened by guocuimi - 20
ScaleLLM vs vLLM in performance
#144 opened by WangErXiao - 0
- 1
pip install scalellm failure.
#212 opened by liutongxuan - 0
install cpython shared lib in manylinux docker image
#215 opened by guocuimi - 3
ScaleLLM Roadmap
#84 opened by guocuimi - 1
[Core] core on the chatglm3 model using scalellm.
#221 opened by liutongxuan - 1
- 0
[Correctness] Using llama-2-7b-hf, scalellm's output is different with vllm's output.
#220 opened by liutongxuan - 0
Developing Python wrapper for easier integration
#161 opened by guocuimi - 0
- 1
- 0
- 0
Introducing the Mamba model
#165 opened by guocuimi - 0
- 0
- 0
Enhancing documentation for improved usability
#162 opened by guocuimi - 0
Exploring other chips such as TPU, etc.
#160 opened by guocuimi - 0
Loosening coupling with PyTorch for easy deployment
#159 opened by guocuimi - 0
Adding more Prometheus metrics and creating a Grafana dashboard for monitoring.
#158 opened by guocuimi - 0
Extending support to macOS and Windows platforms
#156 opened by guocuimi - 0
Structural Decoding: Function Calling
#155 opened by guocuimi - 0
Structural Decoding: Json format
#154 opened by guocuimi - 0
Structural Decoding: Json format
#153 opened by guocuimi - 0
GPU Arch: Turing architecture (sm75)
#152 opened by guocuimi - 0
Adding support for Apple chips
#151 opened by guocuimi - 0
Introducing multi-modal models (LLaVA model)
#150 opened by guocuimi - 0
Implementing MoE (Mixture of Experts) kernels
#149 opened by guocuimi - 0
- 0
- 0
Exploring lookahead decoding support
#146 opened by guocuimi - 4
Support for Visual Models (i.e. LLaVA)
#75 opened by omarmhaimdat - 2
- 2
- 2
The output from the API lacks "usage" content, which is causing compatibility issues when trying to use the API with other tools.
#34 opened by BUJIDAOVS - 3
Driver Version: 535.54.03 CUDA Version: 12.2 ,运行报错“OpenAI API returned an error 503: {"error":{"code":14,"message":"connection error: desc = \"transport: Error while dialing: dial tcp: lookup scalellm on 127.0.0.11:53: server misbehaving\""}}”
#37 opened by Missliuff - 1
- 1
can support mac m1 ?
#49 opened by zyxcambridge - 1
- 1
- 2
scalellm exited with code 137
#38 opened by yisiliang - 9
grpc server connection error
#32 opened by Arcmoon-Hu - 3
使用yi-34b时模型不会主动提停止生成,会不停地生成低质量的重复的内容,应该如何调整?
#31 opened by BUJIDAOVS