demonsan

demonsan's Stars

ROCm/aiter
AI Tensor Engine for ROCm
Language:Python6616
deepseek-ai/FlashMLA
FlashMLA: Efficient MLA decoding kernels
Language:C++11.3k797
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
Language:Cuda2.2k126
flashinfer-ai/flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda2.4k253
BBuf/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
Language:Cuda2k181
ROCm/composable_kernel
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
Language:C++364162
mpoullet/tmplbook
C++ Templates - The Complete Guide, 2nd Edition
Language:C++13957
vllm-project/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
Language:Python41.9k6.3k
SJTU-IPADS/PowerInfer
High-speed Large Language Model Serving for Local Deployment
Language:C++8.2k427
meta-llama/llama-cookbook
Welcome to the Llama Cookbook! This is your go to guide for Building with Llama: Getting started with Inference, Fine-Tuning, RAG. We also show you how to solve end to end problems using Llama model family and using them on various provider services
Language:Jupyter Notebook16.5k2.4k
ROCm/amd_matrix_instruction_calculator
A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators
Language:Python798
xtekky/gpt4free
The official gpt4free repository | various collection of powerful language models | o3 and deepseek r1, gpt-4.5
Language:Python63.8k13.6k
AIGC-Audio/AudioGPT
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Language:Python10.1k863
ROCm/MISA
Machine Intelligence Shader Autogen. AMDGPU ML shader code generator. (previously iGEMMgen)
Language:Python3414
trojan-gfw/trojan
An unidentifiable mechanism that helps you bypass GFW.
Language:C++19.2k3.1k
TalkUHulk/yolov3-TT100k
使用yolov3训练的TT100k(交通标志)模型
Language:Python124
YonghaoHe/LFD-A-Light-and-Fast-Detector
LFD is a big update upon LFFD. Generally, LFD is a multi-class object detector characterized by lightweight, low inference latency and superior precision. It is for real-world appilcations.
Language:Python41982
jamesstringerparsec/Easy-GPU-PV
A Project dedicated to making GPU Partitioning on Windows easier!
Language:PowerShell4.7k485
IntelLabs/distiller
Neural Network Distiller by Intel AI Lab: a Python package for neural network compression research. https://intellabs.github.io/distiller
Language:Jupyter Notebook4.4k803
AkashB23/Quantization-of-DNNs-with-Tensorflow
Includes all necessary files to arrive at a TFlite starting with checkpoints
Language:Python2
erendn/pytorch-compression-for-mcu
Deep Compression for PyTorch Model Deployment on Microcontrollers
Language:Python185
lswzjuer/pytorch-quantity
An 8bit automated quantization conversion tool for the pytorch (Post-training quantization based on KL divergence)
Language:Python332
Jermmy/pytorch-quantization-demo
A simple network quantization demo using pytorch from scratch.
Language:Python52198
cornell-zhang/dnn-quant-ocs
DNN quantization with outlier channel splitting
Language:Python11218
flame/how-to-optimize-gemm
Language:C1.8k356
chasingegg/Winconv
implementation of winograd minimal convolution algorithm on Intel Architecture
Language:C++3910
imarvinle/system-design-primer
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
918233
balloonwj/mybooksources
《C++ 服务器开发精髓》随书配套源码
Language:C21199
agedcat/WebServer
Uubuntu 20 C++版本的web服务器
Language:C++25046
forthespada/InterviewGuide
🔥🔥「InterviewGuide」是阿秀从校园->职场多年计算机自学过程的记录以及学弟学妹们计算机校招&秋招经验总结文章的汇总，包括但不限于C/C++ 、Golang、JavaScript、Vue、操作系统、数据结构、计算机网络、MySQL、Redis等学习总结，坚持学习，持续成长！
5.5k1.5k