akaitsuki-ii

@Microsoft @ByteDanceGuangzhou, Suzhou

akaitsuki-ii's Stars

zero-peak/ZeroOmega
Manage and switch between multiple proxies quickly & easily.
Language:CoffeeScript1.4k36
clash-verge-rev/clash-verge-rev
Continuation of Clash Verge - A Clash Meta GUI based on Tauri (Windows, MacOS, Linux)
Language:TypeScript34.2k2.6k
suno-ai/bark
🔊 Text-Prompted Generative Audio Model
Language:Jupyter Notebook35.5k4.2k
kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
1.1k22
kvcache-ai/ktransformers
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Language:Python68634
xdit-project/xDiT
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters
Language:Python54746
InternLM/lmdeploy
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Language:Python4.3k390
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda1.2k111
sgl-project/sglang
SGLang is a fast serving framework for large language models and vision language models.
Language:Python5.4k393
mem0ai/mem0
The Memory layer for your AI apps
Language:Python22.1k2k
mlabonne/llm-course
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Language:Jupyter Notebook37.7k4k
microsoft/MInference
To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
Language:Python72029
gpu-mode/lectures
Material for gpu-mode lectures
Language:Jupyter Notebook2.6k260
Morakito/Real-Time-Rendering-4th-CN
《Real-Time Rendering 4th》 (RTR4) 中文翻译
2.3k322
naklecha/llama3-from-scratch
llama3 implementation one matrix multiplication at a time
Language:Jupyter Notebook13.2k1.1k
hiyouga/LLaMA-Factory
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
Language:Python31.8k3.9k
huggingface/trl
Train transformer language models with reinforcement learning.
Language:Python9.6k1.2k
chenzomi12/AISystem
AISystem 主要是指AI系统，包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术
Language:Jupyter Notebook10.6k1.5k
2noise/ChatTTS
A generative speech model for daily dialogue.
Language:Python31.2k3.4k
NVIDIA-Merlin/Transformers4Rec
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.
Language:Python1.1k143
NVIDIA-Merlin/HugeCTR
HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training
Language:C++937200
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
Language:Cuda1.5k58
DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
2.6k173
DefTruth/CUDA-Learn-Notes
🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.
Language:Cuda1.2k132
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
Language:Python21.8k2.1k
facebookresearch/generative-recommenders
Repository hosting code used to reproduce results in "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).
Language:Python668111
HFAiLab/hai-platform
一种任务级GPU算力分时调度的高性能深度学习训练平台
Language:Python29838
deepseek-ai/DeepSeek-V2
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
3.5k143
BlackSamorez/tensor_parallel
Automatically split your PyTorch models on multiple GPUs for training & inference
Language:Python61938
adam-maj/tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
Language:SystemVerilog7k522

akaitsuki-ii

akaitsuki-ii's Stars

zero-peak/ZeroOmega

clash-verge-rev/clash-verge-rev

suno-ai/bark

kvcache-ai/Mooncake

kvcache-ai/ktransformers

xdit-project/xDiT

InternLM/lmdeploy

flashinfer-ai/flashinfer

sgl-project/sglang

mem0ai/mem0

mlabonne/llm-course

microsoft/MInference

gpu-mode/lectures

Morakito/Real-Time-Rendering-4th-CN

naklecha/llama3-from-scratch

hiyouga/LLaMA-Factory

huggingface/trl

chenzomi12/AISystem

2noise/ChatTTS

NVIDIA-Merlin/Transformers4Rec

NVIDIA-Merlin/HugeCTR

HazyResearch/ThunderKittens

DefTruth/Awesome-LLM-Inference

DefTruth/CUDA-Learn-Notes

hpcaitech/Open-Sora

facebookresearch/generative-recommenders

HFAiLab/hai-platform

deepseek-ai/DeepSeek-V2

BlackSamorez/tensor_parallel

adam-maj/tiny-gpu