alohali's Stars
pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
BVLC/caffe
Caffe: a fast open framework for deep learning.
Tencent/ncnn
ncnn is a high-performance neural network inference framework optimized for the mobile platform
iperov/DeepFaceLab
DeepFaceLab is the leading software for creating deepfakes.
horovod/horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
Tencent/TNN
TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its cross-platform capability, high performance, model compression and code pruning. Based on ncnn and Rapidnet, TNN further strengthens the support and performance optimization for mobile devices, and also draws on the advantages of good extensibility and high performance from existed open source efforts. TNN has been deployed in multiple Apps from Tencent, such as Mobile QQ, Weishi, Pitu, etc. Contributions are welcome to work in collaborative with us and make TNN a better framework.
alibaba/x-deeplearning
An industrial deep learning framework for high-dimension sparse data
asmjit/asmjit
Low-latency machine code generation
Tencent/FaceDetection-DSFD
腾讯优图高精度双分支人脸检测器
basicmi/AI-Chip
A list of ICs and IPs for AI, Machine Learning and Deep Learning.
OpenPPL/ppl.nn
A primitive library for neural network
tensorflow/benchmarks
A benchmark framework for Tensorflow
DeepRec-AI/DeepRec
DeepRec is a high-performance recommendation deep learning framework based on TensorFlow. It is hosted in incubation in LF AI & Data Foundation.
NVIDIA-Merlin/NVTabular
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.
NVIDIA-Merlin/HugeCTR
HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training
alibaba/BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
NVIDIA/caffe
Caffe: a fast open framework for deep learning.
tpoisonooo/how-to-optimize-gemm
row-major matmul optimization
tensorflow/recommenders-addons
Additional utils and helpers to extend TensorFlow when build recommendation systems, contributed and maintained by SIG Recommenders.
krrishnarraj/clpeak
A tool which profiles OpenCL devices to find their peak capacities
ROCm/composable_kernel
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
milakov/nnForge
Convolutional neural networks C++ framework with CPU and GPU (CUDA) backends
ROCm/flash-attention
Fast and memory-efficient exact attention
shadowsocks/qtun
Yet another SIP003 plugin based on IETF-QUIC
alohali/benchmark-models
benchmark models for TNN, ncnn, MNN
michael-lehn/ulmBLAS-core
NeymarL/MIPS_CPU
5-Segment Pipeline MIPS CPU
diaosj/diaosj.github.io
Keep writing
wyzero/tensorflow
An Open Source Machine Learning Framework for Everyone