Pinned Repositories
awesome-machine-learning-in-compilers
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
barracuda
BARRACUDA: Binary-level Analysis of Runtime RAces in CUDA programs
CameraFeature
Feature description of interested cameras
clang-llvm-tutorial
clang & llvm examples, e.g. AST Interpreter, Function Pointer Analysis, Value Range Analysis, Data-Flow Analysis, Andersen Pointer Analysis, LLVM Backend...
cmake_study
study cmake
CppCon2014
Speaker materials from CppCon 2014
Decoding-CUDA-Binary
taco
The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
TensorComprehensions
A domain specific language to express machine learning workloads.
cutlass
CUDA Templates for Linear Algebra Subroutines
dongxiao92's Repositories
dongxiao92/awesome-machine-learning-in-compilers
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
dongxiao92/barracuda
BARRACUDA: Binary-level Analysis of Runtime RAces in CUDA programs
dongxiao92/CameraFeature
Feature description of interested cameras
dongxiao92/clang-llvm-tutorial
clang & llvm examples, e.g. AST Interpreter, Function Pointer Analysis, Value Range Analysis, Data-Flow Analysis, Andersen Pointer Analysis, LLVM Backend...
dongxiao92/cmake_study
study cmake
dongxiao92/taco
The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
dongxiao92/cuda-convnet
My fork of Alex Krizhevsky's cuda-convnet from 2013 where I added dropout, among other features.
dongxiao92/cutlass
CUDA Templates for Linear Algebra Subroutines
dongxiao92/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
dongxiao92/ebook
classic books of computer science!
dongxiao92/flexible-gemm
flexible-gemm conv of deepcore
dongxiao92/gas
dongxiao92/gemmlowp
Low-precision matrix multiplication
dongxiao92/GVProf
GVProf: A Value Profiler for GPU-based Clusters
dongxiao92/iGUARD
dongxiao92/kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
dongxiao92/MegEngine
MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架
dongxiao92/metrics
📊 An infographics generator with 30+ plugins and 200+ options to display stats about your GitHub account and render them as SVG, Markdown, PDF or JSON!
dongxiao92/modern-cpp-tutorial
📚 Modern C++ Tutorial: C++11/14/17/20 On the Fly | https://changkun.de/modern-cpp/
dongxiao92/parallel-hashmap
A header-only, very fast and memory-friendly hash map.
dongxiao92/ppl.cv
ppl.cv is a high-performance image processing library of openPPL supporting x86 and cuda platforms.
dongxiao92/ppl.nn
A primitive library for neural network
dongxiao92/SPSC_Queue
A highly optimized single producer single consumer message queue C++ template
dongxiao92/the-art-of-command-line
Master the command line, in one page
dongxiao92/tiny-cuda-nn
Lightning fast & tiny C++/CUDA neural network framework
dongxiao92/triton
Development repository for the Triton language and compiler
dongxiao92/turingas
Assembler for NVIDIA Volta and Turing GPUs
dongxiao92/uarch-bench
A benchmark for low-level CPU micro-architectural features
dongxiao92/YHs_Sample
Yinghan's Code Sample
dongxiao92/ZenDNN