demonsan's Stars
ROCm/aiter
AI Tensor Engine for ROCm
deepseek-ai/FlashMLA
FlashMLA: Efficient MLA decoding kernels
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
ROCm/composable_kernel
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
mpoullet/tmplbook
C++ Templates - The Complete Guide, 2nd Edition
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
SJTU-IPADS/PowerInfer
High-speed Large Language Model Serving for Local Deployment
meta-llama/llama-cookbook
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services
ROCm/amd_matrix_instruction_calculator
A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators
xtekky/gpt4free
The official gpt4free repository | various collection of powerful language models | o3 and deepseek r1, gpt-4.5
AIGC-Audio/AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
ROCm/MISA
Machine Intelligence Shader Autogen. AMDGPU ML shader code generator. (previously iGEMMgen)
trojan-gfw/trojan
An unidentifiable mechanism that helps you bypass GFW.
TalkUHulk/yolov3-TT100k
使用yolov3训练的TT100k(交通标志)模型
YonghaoHe/LFD-A-Light-and-Fast-Detector
LFD is a big update upon LFFD. Generally, LFD is a multi-class object detector characterized by lightweight, low inference latency and superior precision. It is for real-world appilcations.
jamesstringerparsec/Easy-GPU-PV
A Project dedicated to making GPU Partitioning on Windows easier!
IntelLabs/distiller
Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller
AkashB23/Quantization-of-DNNs-with-Tensorflow
Includes all necessary files to arrive at a TFlite starting with checkpoints
erendn/pytorch-compression-for-mcu
Deep Compression for PyTorch Model Deployment on Microcontrollers
lswzjuer/pytorch-quantity
An 8bit automated quantization conversion tool for the pytorch (Post-training quantization based on KL divergence)
Jermmy/pytorch-quantization-demo
A simple network quantization demo using pytorch from scratch.
cornell-zhang/dnn-quant-ocs
DNN quantization with outlier channel splitting
flame/how-to-optimize-gemm
chasingegg/Winconv
implementation of winograd minimal convolution algorithm on Intel Architecture
imarvinle/system-design-primer
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
balloonwj/mybooksources
《C++ 服务器开发精髓》随书配套源码
agedcat/WebServer
Uubuntu 20 C++版本的web服务器
forthespada/InterviewGuide
🔥🔥「InterviewGuide」是阿秀从校园->职场多年计算机自学过程的记录以及学弟学妹们计算机校招&秋招经验总结文章的汇总,包括但不限于C/C++ 、Golang、JavaScript、Vue、操作系统、数据结构、计算机网络、MySQL、Redis等学习总结,坚持学习,持续成长!