Pinned Repositories
.tmux
🇫🇷 Oh My Tmux! My pretty + versatile tmux configuration that just works (imho the best tmux configuration)
500lines
500 Lines or Less
aemb
Multi-threaded 32-bit embedded core family.
AES
Advanced Encryption Standard (AES) SystemVerilog Core
catapult
Catapult
CS110
Principles of Computer Systems
cs228-material
Teaching materials for the probabilistic graphical models and deep learning classes at Stanford
Distributed-Systems
MIT课程《Distributed Systems 》学习和翻译
hw_interview_questions
A collection of commonly asked RTL design interview questions
pengwubj's Repositories
pengwubj/catapult
Catapult
pengwubj/.tmux
🇫🇷 Oh My Tmux! My pretty + versatile tmux configuration that just works (imho the best tmux configuration)
pengwubj/CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully :)
pengwubj/CUDA-Learn-Notes
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
pengwubj/CUDAsmith
A CUDA compiler fuzzer
pengwubj/deepfloat
An exploration of log domain "alternative floating point" for hardware ML/AI accelerators.
pengwubj/DeepLearningSystem
Deep Learning System core principles introduction.
pengwubj/DissectingTensorCores
pengwubj/flash-attention
Fast and memory-efficient exact attention
pengwubj/Fractional-GPUs
Splits single Nvidia GPU into multiple partitions with complete compute and memory isolation (wrt to performace) between the partitions
pengwubj/gpu-benches
collection of benchmarks to measure basic GPU capabilities
pengwubj/how-to-optim-algorithm-in-cuda
how to optimize some algorithm in cuda.
pengwubj/leetcode
LeetCode Problems' Solutions
pengwubj/llvm-project
This is the AMD-maintained fork of the LLVM git repository. This repository accepts pull requests and issues related to AMD fork-specific topics (amd/*). For all other issues/PRs, please submit upstream at https://github.com/llvm/llvm-project.
pengwubj/models
Pre-trained and Reproduced Deep Learning Models (经典复现模型)
pengwubj/NBAssembler
Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.
pengwubj/NEMU
pengwubj/netron
Visualizer for deep learning and machine learning models
pengwubj/nsight-training
Training material for Nsight developer tools
pengwubj/nvbit_tools
pengwubj/one-key-hidpi
Enable macOS HiDPI and have a native setting.
pengwubj/open-gpu-kernel-modules
NVIDIA Linux open GPU kernel module source
pengwubj/Project-Zipline
Defines a lossless compressed data format that is independent of CPU type, operating system, file system, and character set, and is suitable for compression using the XP10 algorithm.
pengwubj/riscv-profiles
RISC-V Architecture Profiles
pengwubj/riscv-soc-book
关于RISC-V你所需要知道的一切
pengwubj/SGEMM-SASS-Annotation
pengwubj/spf13-vim
The ultimate vim distribution
pengwubj/swerv_eh1
A directory of Western Digital’s RISC-V SweRV Cores
pengwubj/tensor-cores-numerical-behavior
Test suite for probing the numerical behavior of NVIDIA tensor cores
pengwubj/transformers-benchmarks
real Transformer TeraFLOPS on various GPUs