Pinned Repositories
AI-System
System for AI Education Resource.
apollo
An open autonomous driving platform
CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
cutlass
CUDA Templates for Linear Algebra Subroutines
flash-attention
Fast and memory-efficient exact attention
llama.cpp
Port of Facebook's LLaMA model in C/C++
lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Paddle-Lite
Multi-platform high performance deep learning inference engine (『飞桨』多平台高性能深度学习预测引擎)
sglang
SGLang is a fast serving framework for large language models and vision language models.
text-generation-inference
Large Language Model Text Generation Inference
jameswu2014's Repositories
jameswu2014/llama.cpp
Port of Facebook's LLaMA model in C/C++
jameswu2014/AI-System
System for AI Education Resource.
jameswu2014/apollo
An open autonomous driving platform
jameswu2014/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
jameswu2014/cutlass
CUDA Templates for Linear Algebra Subroutines
jameswu2014/flash-attention
Fast and memory-efficient exact attention
jameswu2014/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
jameswu2014/Paddle-Lite
Multi-platform high performance deep learning inference engine (『飞桨』多平台高性能深度学习预测引擎)
jameswu2014/sglang
SGLang is a fast serving framework for large language models and vision language models.
jameswu2014/text-generation-inference
Large Language Model Text Generation Inference
jameswu2014/tutorials
Tutorials for creating and using ONNX models
jameswu2014/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
jameswu2014/verl
verl: Volcano Engine Reinforcement Learning for LLMs