dongxiao92's Stars
hpcaitech/ColossalAI
Making large AI models cheaper, faster and more accessible
geekan/HowToLiveLonger
程序员延寿指南 | A programmer's guide to live longer
Tencent/ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
google-deepmind/alphafold
Open source code for AlphaFold.
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
Tencent/TNN
TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its cross-platform capability, high performance, model compression and code pruning. Based on ncnn and Rapidnet, TNN further strengthens the support and performance optimization for mobile devices, and also draws on the advantages of good extensibility and high performance from existed open source efforts. TNN has been deployed in multiple Apps from Tencent, such as Mobile QQ, Weishi, Pitu, etc. Contributions are welcome to work in collaborative with us and make TNN a better framework.
CppCon/CppCon2014
Speaker materials from CppCon 2014
google/XNNPACK
High-efficiency floating-point neural network inference operators for mobile, server, and Web
NVIDIA/cub
[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl
ELS-RD/kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
zwang4/awesome-machine-learning-in-compilers
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
openppl-public/ppl.nn
A primitive library for neural network
tensor-compiler/taco
The Tensor Algebra Compiler (taco) computes sparse tensor expressions on CPUs and GPUs
huawei-noah/bolt
Bolt is a deep learning library with high performance and heterogeneous flexibility.
travisdowns/uarch-bench
A benchmark for low-level CPU micro-architectural features
openppl-public/ppl.cv
ppl.cv is a high-performance image processing library of openPPL supporting various platforms.
NVIDIA/cudnn-frontend
cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it
microsoft/nn-Meter
A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.
ROCm/composable_kernel
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
gtcasl/gpuocelot
GPUOCelot: A dynamic compilation framework for PTX
Yinghan-Li/YHs_Sample
Yinghan's Code Sample
lijiansong/clang-llvm-tutorial
clang & llvm examples, e.g. AST Interpreter, Function Pointer Analysis, Value Range Analysis, Data-Flow Analysis, Andersen Pointer Analysis, LLVM Backend...
google-research/sputnik
A library of GPU kernels for sparse matrix operations.
tvmai/meetup-slides
Place for meetup slides
cmdbug/TNN_Demo
🍉 移动端TNN部署学习笔记,支持Android与iOS。
GVProf/GVProf
GVProf: A Value Profiler for GPU-based Clusters
codyjrivera/tsm2x-imp
Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA
zhenhuaw-me/qnnpack
Explained QNNPACK Implementation
XiuYuLi/flexible-gemm
flexible-gemm conv of deepcore
chenxuhao/caffe-escoin
Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs