kaiyux

Engineer @ NVIDIA

NVIDIABeijing, China

kaiyux's Stars

Significant-Gravitas/AutoGPT
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
Language:Python164k 1.6k 2.3k43.4k
CompVis/stable-diffusion
A latent text-to-image diffusion model
Language:Jupyter Notebook66.5k 559 70110k
ggerganov/llama.cpp
LLM inference in C/C++
Language:C++61.1k 517 3.3k8.7k
meta-llama/llama
Inference code for Llama models
Language:Python54.1k 513 9309.3k
Stability-AI/stablediffusion
High-Resolution Image Synthesis with Latent Diffusion Models
Language:Python37.4k 440 2934.8k
chenfei-wu/TaskMatrix
Language:Python34.5k 305 3503.3k
amix/vimrc
The ultimate Vim configuration (vimrc)
Language:Vim Script30.3k 776 5107.3k
Lightning-AI/pytorch-lightning
Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
Language:Python27.4k 246 7k3.3k
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python22k 198 3.3k3.1k
shengcaishizhan/kkndme_tianya
天涯 kkndme 神贴聊房价
18.4k 244 733.8k
apple/ml-stable-diffusion
Stable Diffusion with Core ML on Apple Silicon
Language:Python16.4k 142 234886
ml-explore/mlx
MLX: An array framework for Apple silicon
Language:C++15.6k 137 454892
openai/evals
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
Language:Python14.3k 265 2032.5k
ggerganov/ggml
Tensor library for machine learning
Language:C10.3k 119 381942
microsoft/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
Language:Python9.7k 65 103615
FMInference/FlexGen
Running large language models on a single GPU for throughput-oriented scenarios.
Language:Python9.1k 109 80528
lucidrains/PaLM-rlhf-pytorch
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
Language:Python7.6k 142 46668
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++7.4k 84 1.5k793
facebookincubator/AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
Language:Python4.5k 82 241359
NVIDIA/warp
A Python framework for high performance GPU simulation and graphics
Language:Python3.6k 53 173202
mlcommons/inference
Reference implementations of MLPerf™ inference benchmarks
Language:Python1.1k 60 749497
pytorch/torchdynamo
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
Language:Python977 47 567125
huggingface/optimum-nvidia
Language:Python834 40 5882
triton-inference-server/pytriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
Language:Python682 18 6845
pytorch/PiPPy
Pipeline Parallelism for PyTorch
Language:Python663 36 25079
triton-inference-server/tensorrtllm_backend
The Triton TensorRT-LLM Backend
Language:Python584 22 40881
NVIDIA/multi-gpu-programming-models
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
Language:Cuda482 28 10100
triton-inference-server/backend
Common source, scripts and utilities for creating Triton backends.
Language:C++268 13 079
NVIDIA/pyxis
Container plugin for Slurm Workload Manager
Language:C257 10 12628
microsoft/Accera
Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research
Language:C++87 13 217

kaiyux

kaiyux's Stars

Significant-Gravitas/AutoGPT

CompVis/stable-diffusion

ggerganov/llama.cpp

meta-llama/llama

Stability-AI/stablediffusion

chenfei-wu/TaskMatrix

amix/vimrc

Lightning-AI/pytorch-lightning

vllm-project/vllm

shengcaishizhan/kkndme_tianya

apple/ml-stable-diffusion

ml-explore/mlx

openai/evals

ggerganov/ggml

microsoft/LoRA

FMInference/FlexGen

lucidrains/PaLM-rlhf-pytorch

NVIDIA/TensorRT-LLM

facebookincubator/AITemplate

NVIDIA/warp

mlcommons/inference

pytorch/torchdynamo

huggingface/optimum-nvidia

triton-inference-server/pytriton

pytorch/PiPPy

triton-inference-server/tensorrtllm_backend

NVIDIA/multi-gpu-programming-models

triton-inference-server/backend

NVIDIA/pyxis

microsoft/Accera