qinsiyuan-cool
I am an undergraduate majoring in software engineering. Welcome communication and guidance.
qinsiyuan-cool's Stars
inside-compiler/llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
inside-compiler/Inside-LLVM-Code-Gen
图书《深入理解LLVM代码生成》的配套示例代码
0voice/introduce_c-cpp_manual
一个收集C/C++新手学习的入门项目,整理收纳开发者开源的小项目、工具、框架、游戏等,视频,书籍,面试题/算法题,技术文章。
LearningInfiniTensor/TinyInfiniTensor
hahnyuan/LLM-Viewer
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
liguodongiot/llm-action
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
Dao-AILab/flash-attention
Fast and memory-efficient exact attention
AdvancedCompiler/AdvancedCompiler
先进编译实验室的个人主页
0voice/interview_internal_reference
2023年最新总结,阿里,腾讯,百度,美团,头条等技术面试题目,以及答案,专家出题人分析汇总。
ModelTC/lightllm
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Bruce-Lee-LY/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
THU-DSP-LAB/llvm-project
LLVM OpenCL C compiler suite for ventus GPGPU
SqueezeAILab/LLMCompiler
[ICML 2024] LLMCompiler: An LLM Compiler for Parallel Function Calling
zjin-lcf/HeCBench
ai-dawang/PlugNPlay-Modules
hustvl/Vim
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
NVlabs/instant-ngp
Instant neural graphics primitives: lightning fast NeRF and more
microsoft/BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
microsoft/T-MAC
Low-bit LLM inference on CPU with lookup table
karpathy/nano-llama31
nanoGPT style version of Llama 3.1
zjhellofss/KuiperInfer
校招、秋招、春招、实习好项目!带你从零实现一个高性能的深度学习推理库,支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library step by step
DefTruth/Awesome-LLM-Inference
📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉
zjhellofss/KuiperLLama
校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。
openmlsys/openmlsys-cuda
Tutorials for writing high-performance GPU operators in AI frameworks.
InfiniTensor/InfiniTensor
wangzhaode/llm-export
llm-export can export llm model to onnx.
HeKun-NVIDIA/CUDA-Programming-Guide-in-Chinese
This is a Chinese translation of the CUDA programming guide
karpathy/llm.c
LLM training in simple, raw C/CUDA
Liu-xiandong/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
Cjkkkk/CUDA_gemm
A simple high performance CUDA GEMM implementation.