Pinned Repositories
3D-Speaker
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
CUDA-Learn-Note
🎉CUDA 笔记 / 高频面试题汇总 / C++笔记,个人笔记,更新随缘: sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
cutlass_quant
Playing with quantization
dtlzhuangz
EAGLE
[ICML'24] EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
EETQ
Easy and Efficient Quantization for Transformers
FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
HPC-Learning-Notes
高性能计算相关知识学习笔记,包含学习笔记和相关知识的代码demo,在持续完善中。 如果有帮助的话请Star一下,对作者帮助很大,谢谢!
dtlzhuangz's Repositories
dtlzhuangz/CUDA-Learn-Note
🎉CUDA 笔记 / 高频面试题汇总 / C++笔记,个人笔记,更新随缘: sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
dtlzhuangz/3D-Speaker
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
dtlzhuangz/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
dtlzhuangz/cutlass_quant
Playing with quantization
dtlzhuangz/dtlzhuangz
dtlzhuangz/EAGLE
[ICML'24] EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
dtlzhuangz/EETQ
Easy and Efficient Quantization for Transformers
dtlzhuangz/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.
dtlzhuangz/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
dtlzhuangz/HPC-Learning-Notes
高性能计算相关知识学习笔记,包含学习笔记和相关知识的代码demo,在持续完善中。 如果有帮助的话请Star一下,对作者帮助很大,谢谢!
dtlzhuangz/Learn-CUDA-Programming
Learn CUDA Programming, published by Packt
dtlzhuangz/lectures
Material for cuda-mode lectures
dtlzhuangz/test
dtlzhuangz/text-generation-inference
Large Language Model Text Generation Inference
dtlzhuangz/vllm
A high-throughput and memory-efficient inference and serving engine for LLMs