cowyyy's Stars
tensorflow/tensorflow
An Open Source Machine Learning Framework for Everyone
ONNC/onnc-tutorial
ONNC/onnc
Open Neural Network Compiler
graphcore/popart
Poplar Advanced Runtime for the IPU
OpenGPGPU/opengpgpu
HuiiJi/ez_ISP
This is a easy ISP (ez_ISP) for RAW to RGB conversion.
graphcore/llvm-project-fork
Fork of LLVM Project containing a Colossus IPU backend implementation
MegEngine/MegCC
MegCC是一个运行时超轻量,高效,移植简单的深度学习模型编译器
gpgpu-sim/gpgpu-sim_distribution
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.
rui314/chibicc
A small C compiler
rui314/8cc
A Small C Compiler
Liu-xiandong/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
chncwang/InsNet
InsNet Runs Instance-dependent Neural Networks with Padding-free Dynamic Batching.
YellowOldOdd/SDBI
Simple Dynamic Batching Inference
BBuf/tvm_mlir_learn
compiler learning resources collect.
amhndu/SimpleNES
An NES emulator in C++
NVlabs/tiny-cuda-nn
Lightning fast C++/CUDA neural network framework
herumi/xbyak
a JIT assembler for x86(IA-32)/x64(AMD64, x86-64) MMX/SSE/SSE2/SSE3/SSSE3/SSE4/FPU/AVX/AVX2/AVX-512 by C++ header
666DZY666/micronet
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
Zhengtq/ncnn_breakdown
A breakdown of NCNN
xylcbd/EasyCNN
easy convolution neural network
ArtyZe/yolo_quantization
Based of paper "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"
xboot/libonnx
A lightweight, portable pure C99 onnx inference engine for embedded devices with hardware acceleration support.
senlinuc/caffe_ocr
主流ocr算法研究实验性的项目,目前实现了CNN+BLSTM+CTC架构
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
wanghaisheng/awesome-ocr
A curated list of promising OCR resources
baidu-research/warp-ctc
Fast parallel CTC.
tmbdev/clstm
A small C++ implementation of LSTM networks, focused on OCR.
MartinChan3/ClipperDocCN
The documention of ClipperLib in Chinese
JDAI-CV/dabnn
dabnn is an accelerated binary neural networks inference framework for mobile platform