akaitsuki-ii's Stars
zero-peak/ZeroOmega
Manage and switch between multiple proxies quickly & easily.
clash-verge-rev/clash-verge-rev
Continuation of Clash Verge - A Clash Meta GUI based on Tauri (Windows, MacOS, Linux)
suno-ai/bark
🔊 Text-Prompted Generative Audio Model
kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
kvcache-ai/ktransformers
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
xdit-project/xDiT
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
mem0ai/mem0
The Memory layer for your AI apps
mlabonne/llm-course
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
microsoft/MInference
To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
gpu-mode/lectures
Material for gpu-mode lectures
Morakito/Real-Time-Rendering-4th-CN
《Real-Time Rendering 4th》 (RTR4) 中文翻译
naklecha/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
hiyouga/LLaMA-Factory
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
huggingface/trl
Train transformer language models with reinforcement learning.
chenzomi12/AISystem
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
2noise/ChatTTS
A generative speech model for daily dialogue.
NVIDIA-Merlin/Transformers4Rec
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.
NVIDIA-Merlin/HugeCTR
HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
DefTruth/CUDA-Learn-Notes
🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
facebookresearch/generative-recommenders
Repository hosting code used to reproduce results in "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).
HFAiLab/hai-platform
一种任务级GPU算力分时调度的高性能深度学习训练平台
deepseek-ai/DeepSeek-V2
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
BlackSamorez/tensor_parallel
Automatically split your PyTorch models on multiple GPUs for training & inference
adam-maj/tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up