kaiyux's Stars
Significant-Gravitas/AutoGPT
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
CompVis/stable-diffusion
A latent text-to-image diffusion model
ggerganov/llama.cpp
LLM inference in C/C++
meta-llama/llama
Inference code for Llama models
Stability-AI/stablediffusion
High-Resolution Image Synthesis with Latent Diffusion Models
chenfei-wu/TaskMatrix
amix/vimrc
The ultimate Vim configuration (vimrc)
Lightning-AI/pytorch-lightning
Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
shengcaishizhan/kkndme_tianya
天涯 kkndme 神贴聊房价
apple/ml-stable-diffusion
Stable Diffusion with Core ML on Apple Silicon
ml-explore/mlx
MLX: An array framework for Apple silicon
openai/evals
Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.
ggerganov/ggml
Tensor library for machine learning
microsoft/LoRA
Code for loralib, an implementation of "LoRA: Low-Rank Adaptation of Large Language Models"
FMInference/FlexGen
Running large language models on a single GPU for throughput-oriented scenarios.
lucidrains/PaLM-rlhf-pytorch
Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
facebookincubator/AITemplate
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
NVIDIA/warp
A Python framework for high performance GPU simulation and graphics
mlcommons/inference
Reference implementations of MLPerf™ inference benchmarks
pytorch/torchdynamo
A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
huggingface/optimum-nvidia
triton-inference-server/pytriton
PyTriton is a Flask/FastAPI-like interface that simplifies Triton's deployment in Python environments.
pytorch/PiPPy
Pipeline Parallelism for PyTorch
triton-inference-server/tensorrtllm_backend
The Triton TensorRT-LLM Backend
NVIDIA/multi-gpu-programming-models
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
triton-inference-server/backend
Common source, scripts and utilities for creating Triton backends.
NVIDIA/pyxis
Container plugin for Slurm Workload Manager
microsoft/Accera
Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research