cowyyy's Stars
tensorflow/tensorflow
An Open Source Machine Learning Framework for Everyone
rui314/chibicc
A small C compiler
rui314/8cc
A Small C Compiler
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
amhndu/SimpleNES
An NES emulator in C++
baidu-research/warp-ctc
Fast parallel CTC.
NVlabs/tiny-cuda-nn
Lightning fast C++/CUDA neural network framework
BBuf/tvm_mlir_learn
compiler learning resources collect.
666DZY666/micronet
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
herumi/xbyak
A JIT assembler for x86/x64 architectures supporting MMX, SSE (1-4), AVX (1-2, 512), FPU, APX, and AVX10.2
wanghaisheng/awesome-ocr
A curated list of promising OCR resources
senlinuc/caffe_ocr
主流ocr算法研究实验性的项目,目前实现了CNN+BLSTM+CTC架构
gpgpu-sim/gpgpu-sim_distribution
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.
Liu-xiandong/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
tmbdev/clstm
A small C++ implementation of LSTM networks, focused on OCR.
JDAI-CV/dabnn
dabnn is an accelerated binary neural networks inference framework for mobile platform
xboot/libonnx
A lightweight, portable pure C99 onnx inference engine for embedded devices with hardware acceleration support.
ONNC/onnc
Open Neural Network Compiler
MegEngine/MegCC
MegCC是一个运行时超轻量,高效,移植简单的深度学习模型编译器
xylcbd/EasyCNN
easy convolution neural network
YellowOldOdd/SDBI
Simple Dynamic Batching Inference
HuiiJi/ez_ISP
This is a easy ISP (ez_ISP) for RAW to RGB conversion.
MartinChan3/ClipperDocCN
The documention of ClipperLib in Chinese
chncwang/InsNet
InsNet Runs Instance-dependent Neural Networks with Padding-free Dynamic Batching.
OpenGPGPU/opengpgpu
ArtyZe/yolo_quantization
Based of paper "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"
ONNC/onnc-tutorial
Zhengtq/ncnn_breakdown
A breakdown of NCNN
graphcore/llvm-project-fork
Fork of LLVM Project containing a Colossus IPU backend implementation
graphcore/popart
Poplar Advanced Runtime for the IPU