dongxiao92

Architect at NVIDIA

NVIDIAShanghai, China

Pinned Repositories

awesome-machine-learning-in-compilers
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
00
barracuda
BARRACUDA: Binary-level Analysis of Runtime RAces in CUDA programs
Language:C++0 1 00
CameraFeature
Feature description of interested cameras
Language:TeX0 1 00
clang-llvm-tutorial
clang & llvm examples, e.g. AST Interpreter, Function Pointer Analysis, Value Range Analysis, Data-Flow Analysis, Andersen Pointer Analysis, LLVM Backend...
Language:C++0 1 00
cmake_study
study cmake
Language:CMake00
CppCon2014
Speaker materials from CppCon 2014
Language:C++00
Decoding-CUDA-Binary
Language:C++0 1 00
taco
The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
Language:C++00
TensorComprehensions
A domain specific language to express machine learning workloads.
Language:C++10
cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++4.8k 104 910830

dongxiao92's Repositories

dongxiao92/awesome-machine-learning-in-compilers
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
00
dongxiao92/barracuda
BARRACUDA: Binary-level Analysis of Runtime RAces in CUDA programs
Language:C++0 1 00
dongxiao92/CameraFeature
Feature description of interested cameras
Language:TeX0 1 00
dongxiao92/clang-llvm-tutorial
clang & llvm examples, e.g. AST Interpreter, Function Pointer Analysis, Value Range Analysis, Data-Flow Analysis, Andersen Pointer Analysis, LLVM Backend...
Language:C++0 1 00
dongxiao92/cmake_study
study cmake
Language:CMake00
dongxiao92/taco
The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
Language:C++00
dongxiao92/cuda-convnet
My fork of Alex Krizhevsky's cuda-convnet from 2013 where I added dropout, among other features.
Language:Cuda1 0
dongxiao92/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++1 0
dongxiao92/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python1 0
dongxiao92/ebook
classic books of computer science!
1 0
dongxiao92/flexible-gemm
flexible-gemm conv of deepcore
Language:C1 0
dongxiao92/gas
Language:C++1 0
dongxiao92/gemmlowp
Low-precision matrix multiplication
Language:C++1 0
dongxiao92/GVProf
GVProf: A Value Profiler for GPU-based Clusters
Language:Python1 0
dongxiao92/iGUARD
Language:Cuda1 0
dongxiao92/kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
Language:Jupyter Notebook1 0
dongxiao92/MegEngine
MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架
dongxiao92/metrics
📊 An infographics generator with 30+ plugins and 200+ options to display stats about your GitHub account and render them as SVG, Markdown, PDF or JSON!
Language:JavaScript1 0
dongxiao92/modern-cpp-tutorial
📚 Modern C++ Tutorial: C++11/14/17/20 On the Fly | https://changkun.de/modern-cpp/
Language:C++1 0
dongxiao92/parallel-hashmap
A header-only, very fast and memory-friendly hash map.
Language:C++1 0
dongxiao92/ppl.cv
ppl.cv is a high-performance image processing library of openPPL supporting x86 and cuda platforms.
Language:C++1 0
dongxiao92/ppl.nn
A primitive library for neural network
Language:C++1 0
dongxiao92/SPSC_Queue
A highly optimized single producer single consumer message queue C++ template
dongxiao92/the-art-of-command-line
Master the command line, in one page
dongxiao92/tiny-cuda-nn
Lightning fast & tiny C++/CUDA neural network framework
Language:C++1 0
dongxiao92/triton
Development repository for the Triton language and compiler
dongxiao92/turingas
Assembler for NVIDIA Volta and Turing GPUs
Language:Python1 0
dongxiao92/uarch-bench
A benchmark for low-level CPU micro-architectural features
Language:C++1 0
dongxiao92/YHs_Sample
Yinghan's Code Sample
dongxiao92/ZenDNN
Language:C++1 0