Pinned Repositories
abseil-cpp
Abseil Common Libraries (C++)
benchmark
A microbenchmark support library
Blog
My Blog
cmake_tutorial
CUDA-Learn-Notes
🎉CUDA/C++ 笔记 / 大模型手撕CUDA / 技术博客,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
face-detection-based-on-caffe
flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
flashinfer
FlashInfer: Kernel Library for LLM Serving
how-to-optimize-gemm
ARM RowMajor sgemm optimization
nnvm_tvm_demos
deploy nnvm tvm to android
sgxu's Repositories
sgxu/nnvm_tvm_demos
deploy nnvm tvm to android
sgxu/abseil-cpp
Abseil Common Libraries (C++)
sgxu/benchmark
A microbenchmark support library
sgxu/Blog
My Blog
sgxu/cmake_tutorial
sgxu/CUDA-Learn-Notes
🎉CUDA/C++ 笔记 / 大模型手撕CUDA / 技术博客,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
sgxu/face-detection-based-on-caffe
sgxu/flash-attention-minimal
Flash Attention in ~100 lines of CUDA (forward pass only)
sgxu/flashinfer
FlashInfer: Kernel Library for LLM Serving
sgxu/how-to-optimize-gemm
ARM RowMajor sgemm optimization
sgxu/iOS-LinkMapAnalyzer
解析iOS工程中的linkmap文件,方便分析各个模块占用的包大小
sgxu/libhv
Like libevent and libuv, libhv provides event-loop with non-blocking IO and timer, but simpler api and richer protocols.
sgxu/LinkMapParser
A tool for parsing iOS app link map file.
sgxu/MatmulTutorial
A Easy-to-understand TensorOp Matmul Tutorial
sgxu/TensorRT-Developer_Guide_in_Chinese
TensorRT 高级用法
sgxu/WebServer
A C++ High Performance Web Server
sgxu/XNNPACK
High-efficiency floating-point neural network inference operators for mobile, server, and Web