irasin's Stars
Textualize/rich
Rich is a Python library for rich text and beautiful formatting in the terminal.
xiaolincoder/CS-Base
图解计算机网络、操作系统、计算机组成、数据库,共 1000 张图 + 50 万字,破除晦涩难懂的计算机基础知识,让天下没有难懂的八股文!🚀 在线阅读:https://xiaolincoding.com
adam-maj/tiny-gpu
A minimal GPU design in Verilog to learn how GPUs work from the ground up
HazyResearch/ThunderKittens
Tile primitives for speedy kernels
DefTruth/CUDA-Learn-Notes
📚150+ Tensor/CUDA Cores Kernels, ⚡️flash-attn-mma, ⚡️hgemm with WMMA, MMA and CuTe (98%~100% TFLOPS of cuBLAS/FA2 🎉🎉).
ray-project/llm-applications
A comprehensive guide to building RAG-based LLM applications for production.
tip-of-the-week/cpp
C++ Tip Of The Week
boost-ext/ut
C++20 μ(micro)/Unit Testing Framework
gpgpu-sim/gpgpu-sim_distribution
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.
lizhe2004/Awesome-LLM-RAG-Application
the resources about the application based on LLM with RAG pattern
ilqvya/random
Random for modern C++ with convenient API
banach-space/clang-tutor
A collection of out-of-tree Clang plugins for teaching and learning
owenliang/qwen-vllm
通义千问VLLM推理部署DEMO
PacktPublishing/Learn-LLVM-12
Learn LLVM 12, published by Packt
Cjkkkk/CUDA_gemm
A simple high performance CUDA GEMM implementation.
NVIDIA/nvbandwidth
A tool for bandwidth measurements on NVIDIA GPUs.
blackinkkkxi/RAG_langchain
一个基于langchain实现RAG的简单示例
Bruce-Lee-LY/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
KnowingNothing/MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
te42kyfo/gpu-benches
collection of benchmarks to measure basic GPU capabilities
codeplaysoftware/portBLAS
An implementation of BLAS using the SYCL open standard.
franneck94/CppProjectTemplate
C++ project template with unit-tests, documentation, ci-testing and workflows.
TiledTensor/TiledCUDA
TiledCUDA is a highly efficient kernel template library designed to elevate CUDA C’s level of abstraction for processing tiles.
hunterzju/llvm-tutorial
llvm-tutorial文档,翻译以及代码仓库
wzsh/wmma_tensorcore_sample
Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)
nicolaswilde/cuda-tensorcore-hgemm
AyakaGEMM/Hands-on-GEMM
wmmae/wmma_extension
An extension library of WMMA API (Tensor Core API)
MARD1NO/CUDA-PPT
NVIDIA/online-softmax
Benchmark code for the "Online normalizer calculation for softmax" paper