dongxiao92

Architect at NVIDIA

NVIDIAShanghai, China

dongxiao92's Stars

hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
Language:Python38.7k 383 1.7k4.3k
geekan/HowToLiveLonger
程序员延寿指南 | A programmer's guide to live longer
29.8k 231 1292.1k
Tencent/ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Language:C++20.2k 572 3.5k4.1k
google-deepmind/alphafold
Open source code for AlphaFold.
Language:Python12.3k 221 8502.2k
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++5.4k 103 1.1k917
Tencent/TNN
TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its cross-platform capability, high performance, model compression and code pruning. Based on ncnn and Rapidnet, TNN further strengthens the support and performance optimization for mobile devices, and also draws on the advantages of good extensibility and high performance from existed open source efforts. TNN has been deployed in multiple Apps from Tencent, such as Mobile QQ, Weishi, Pitu, etc. Contributions are welcome to work in collaborative with us and make TNN a better framework.
Language:C++4.4k 92 954768
CppCon/CppCon2014
Speaker materials from CppCon 2014
Language:C++2.3k 313 0393
google/XNNPACK
High-efficiency floating-point neural network inference operators for mobile, server, and Web
Language:C1.8k 53 220346
NVIDIA/cub
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
Language:Cuda1.7k 90 281447
ELS-RD/kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
Language:Jupyter Notebook1.5k 29 17493
zwang4/awesome-machine-learning-in-compilers
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
1.4k 70 0160
openppl-public/ppl.nn
A primitive library for neural network
Language:C++1.3k 36 113212
tensor-compiler/taco
The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
Language:C++1.2k 56 357186
huawei-noah/bolt
Bolt is a deep learning library with high performance and heterogeneous flexibility.
Language:C++910 50 74158
travisdowns/uarch-bench
A benchmark for low-level CPU micro-architectural features
Language:C++679 34 8559
openppl-public/ppl.cv
ppl.cv is a high-performance image processing library of openPPL supporting various platforms.
Language:C++485 16 41108
NVIDIA/cudnn-frontend
cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
Language:C++423 14 6185
microsoft/nn-Meter
A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.
Language:Python334 15 4959
ROCm/composable_kernel
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
Language:C++297 24 216113
gtcasl/gpuocelot
GPUOCelot: A dynamic compilation framework for PTX
Language:C++277 33 10669
Yinghan-Li/YHs_Sample
Yinghan's Code Sample
Language:Cuda277 7 453
lijiansong/clang-llvm-tutorial
clang & llvm examples, e.g. AST Interpreter, Function Pointer Analysis, Value Range Analysis, Data-Flow Analysis, Andersen Pointer Analysis, LLVM Backend...
Language:C++260 13 356
google-research/sputnik
A library of GPU kernels for sparse matrix operations.
Language:C++241 10 850
tvmai/meetup-slides
Place for meetup slides
140 17 117
cmdbug/TNN_Demo
🍉 移动端TNN部署学习笔记，支持Android与iOS。
Language:C++70 2 1918
GVProf/GVProf
GVProf: A Value Profiler for GPU-based Clusters
Language:Python46 5 239
codyjrivera/tsm2x-imp
Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA
Language:Cuda31 6 211
zhenhuaw-me/qnnpack
Explained QNNPACK Implementation
Language:C20 3 010
XiuYuLi/flexible-gemm
flexible-gemm conv of deepcore
Language:C17 4 014
chenxuhao/caffe-escoin
Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs
Language:C++15 4 22

dongxiao92

dongxiao92's Stars

hpcaitech/ColossalAI

geekan/HowToLiveLonger

Tencent/ncnn

google-deepmind/alphafold

NVIDIA/cutlass

Tencent/TNN

CppCon/CppCon2014

google/XNNPACK

NVIDIA/cub

ELS-RD/kernl

zwang4/awesome-machine-learning-in-compilers

openppl-public/ppl.nn

tensor-compiler/taco

huawei-noah/bolt

travisdowns/uarch-bench

openppl-public/ppl.cv

NVIDIA/cudnn-frontend

microsoft/nn-Meter

ROCm/composable_kernel

gtcasl/gpuocelot

Yinghan-Li/YHs_Sample

lijiansong/clang-llvm-tutorial

google-research/sputnik

tvmai/meetup-slides

cmdbug/TNN_Demo

GVProf/GVProf

codyjrivera/tsm2x-imp

zhenhuaw-me/qnnpack

XiuYuLi/flexible-gemm

chenxuhao/caffe-escoin