Raphael-Hao's Stars
deepseek-ai/DeepSeek-V3
astral-sh/uv
An extremely fast Python package and project manager, written in Rust.
geekan/MetaGPT
🌟 The Multi-Agent Framework: First AI Software Company, Towards Natural Language Programming
deepseek-ai/awesome-deepseek-integration
Integrate the DeepSeek API into popular softwares
stanfordnlp/dspy
DSPy: The framework for programming—not prompting—language models
waydabber/BetterDisplay
Unlock your displays on your Mac! Flexible HiDPI scaling, XDR/HDR extra brightness, virtual screens, DDC control, extra dimming, PIP/streaming, EDID override and lots more!
kvcache-ai/ktransformers
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
deepseek-ai/FlashMLA
FlashMLA: Efficient MLA kernels
modelscope/ms-swift
Use PEFT or Full-parameter to CPT/SFT/DPO/GRPO 500+ LLMs (Qwen3, Qwen3-MoE, Llama4, GLM4.5, InternLM3, DeepSeek-R1, ...) and 200+ MLLMs (Qwen2.5-VL, Qwen2.5-Omni, InternVL3.5, Ovis2.5, Llava, GLM4v, Phi4, ...) (AAAI 2025).
deepseek-ai/3FS
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
deepseek-ai/DeepEP
DeepEP: an efficient expert-parallel communication library
deepseek-ai/open-infra-index
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
meta-llama/llama-stack
Composable building blocks to build Llama Apps
deepseek-ai/DeepGEMM
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
NVIDIA/DALI
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
pytorch/torchtitan
A PyTorch native platform for training generative AI models
deepseek-ai/DualPipe
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
NVIDIA/cccl
CUDA Core Compute Libraries
tile-ai/tilelang
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
deepseek-ai/EPLB
Expert Parallelism Load Balancer
mit-han-lab/nunchaku
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
deepseek-ai/profile-data
Analyze computation-communication overlap in V3/R1.
zhihu/ZhiLight
A highly optimized LLM inference acceleration engine for Llama and its variants.
alibaba/rtp-llm
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
aliyun/SimAI
vision-x-nyu/thinking-in-space
Official repo and evaluation implementation of VSI-Bench
geohot/cuda_ioctl_sniffer
Sniff CUDA ioctls
bytedance/InfiniStore
KV cache store for distributed LLM inference
SubjectNoi/RTtention