jameswu2014

Pinned Repositories

AI-System
System for AI Education Resource.
Language:Python0 0 00
apollo
An open autonomous driving platform
Language:C++0 1 00
CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Language:Python0 0 00
cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++0 0 00
flash-attention
Fast and memory-efficient exact attention
Language:Python0 0 00
llama.cpp
Port of Facebook's LLaMA model in C/C++
Language:C1 1 00
lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Language:Python0 0 00
Paddle-Lite
Multi-platform high performance deep learning inference engine (『飞桨』多平台高性能深度学习预测引擎）
Language:C++0 1 00
sglang
SGLang is a fast serving framework for large language models and vision language models.
Language:Python00
text-generation-inference
Large Language Model Text Generation Inference
Language:Python0 0 00

jameswu2014's Repositories

jameswu2014/llama.cpp
Port of Facebook's LLaMA model in C/C++
Language:C1 1 00
jameswu2014/AI-System
System for AI Education Resource.
Language:Python0 0 00
jameswu2014/apollo
An open autonomous driving platform
Language:C++0 1 00
jameswu2014/CosyVoice
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Language:Python0 0 00
jameswu2014/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++0 0 00
jameswu2014/flash-attention
Fast and memory-efficient exact attention
Language:Python0 0 00
jameswu2014/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Language:Python0 0 00
jameswu2014/Paddle-Lite
Multi-platform high performance deep learning inference engine (『飞桨』多平台高性能深度学习预测引擎）
Language:C++0 1 00
jameswu2014/sglang
SGLang is a fast serving framework for large language models and vision language models.
Language:Python00
jameswu2014/text-generation-inference
Large Language Model Text Generation Inference
Language:Python0 0 00
jameswu2014/tutorials
Tutorials for creating and using ONNX models
Language:Jupyter Notebook0 1 00
jameswu2014/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python0 0 00
jameswu2014/verl
verl: Volcano Engine Reinforcement Learning for LLMs
Language:Python