cowyyy

AI Framework/Compiler; Model Compression; Deep Learning

UNSW

cowyyy's Stars

tensorflow/tensorflow
An Open Source Machine Learning Framework for Everyone
Language:C++187k 7.5k 40.1k74.4k
rui314/chibicc
A small C compiler
Language:C9.8k 176 109892
rui314/8cc
A Small C Compiler
Language:C6.2k 250 59747
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
Language:C++6k 63 625896
amhndu/SimpleNES
An NES emulator in C++
Language:C++4.9k 97 391.1k
baidu-research/warp-ctc
Fast parallel CTC.
Language:Cuda4.1k 355 1301k
NVlabs/tiny-cuda-nn
Lightning fast C++/CUDA neural network framework
Language:C++3.8k 50 396468
BBuf/tvm_mlir_learn
compiler learning resources collect.
Language:Python2.2k 36 4340
666DZY666/micronet
micronet, a model compression and deploy lib. compression: 1、quantization: quantization-aware-training(QAT), High-Bit(>2b)(DoReFa/Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference)、Low-Bit(≤2b)/Ternary and Binary(TWN/BNN/XNOR-Net); post-training-quantization(PTQ), 8-bit(tensorrt); 2、 pruning: normal、regular and group convolutional channel pruning; 3、 group convolution structure; 4、batch-normalization fuse for quantization. deploy: tensorrt, fp32/fp16/int8(ptq-calibration)、op-adapt(upsample)、dynamic_shape
Language:Python2.2k 41 110478
herumi/xbyak
A JIT assembler for x86/x64 architectures supporting MMX, SSE (1-4), AVX (1-2, 512), FPU, APX, and AVX10.2
Language:C++2.1k 115 94276
wanghaisheng/awesome-ocr
A curated list of promising OCR resources
1.7k 96 119351
senlinuc/caffe_ocr
主流ocr算法研究实验性的项目，目前实现了CNN+BLSTM+CTC架构
Language:C++1.3k 95 144535
gpgpu-sim/gpgpu-sim_distribution
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as well as a performance visualization tool, AerialVisoin, and an integrated energy model, GPUWattch.
Language:C++1.2k 46 171519
Liu-xiandong/How_to_optimize_in_GPU
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
Language:Cuda880 13 15139
tmbdev/clstm
A small C++ implementation of LSTM networks, focused on OCR.
Language:Jupyter Notebook821 101 97226
JDAI-CV/dabnn
dabnn is an accelerated binary neural networks inference framework for mobile platform
Language:C++773 38 29102
xboot/libonnx
A lightweight, portable pure C99 onnx inference engine for embedded devices with hardware acceleration support.
Language:C592 29 26108
ONNC/onnc
Open Neural Network Compiler
Language:C++515 57 6892
MegEngine/MegCC
MegCC是一个运行时超轻量，高效，移植简单的深度学习模型编译器
Language:C++474 18 2257
xylcbd/EasyCNN
easy convolution neural network
Language:C++167 15 851
YellowOldOdd/SDBI
Simple Dynamic Batching Inference
Language:Python145 4 217
HuiiJi/ez_ISP
This is a easy ISP (ez_ISP) for RAW to RGB conversion.
Language:Python94 1 113
MartinChan3/ClipperDocCN
The documention of ClipperLib in Chinese
79 6 026
chncwang/InsNet
InsNet Runs Instance-dependent Neural Networks with Padding-free Dynamic Batching.
Language:C++66 3 012
OpenGPGPU/opengpgpu
Language:Scala65 2 05
ArtyZe/yolo_quantization
Based of paper "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"
Language:C61 2 1013
ONNC/onnc-tutorial
Language:C++58 8 426
Zhengtq/ncnn_breakdown
A breakdown of NCNN
Language:C++46 2 214
graphcore/llvm-project-fork
Fork of LLVM Project containing a Colossus IPU backend implementation
11 1 03
graphcore/popart
Poplar Advanced Runtime for the IPU
Language:C++6 3 64

cowyyy

cowyyy's Stars

tensorflow/tensorflow

rui314/chibicc

rui314/8cc

NVIDIA/FasterTransformer

amhndu/SimpleNES

baidu-research/warp-ctc

NVlabs/tiny-cuda-nn

BBuf/tvm_mlir_learn

666DZY666/micronet

herumi/xbyak

wanghaisheng/awesome-ocr

senlinuc/caffe_ocr

gpgpu-sim/gpgpu-sim_distribution

Liu-xiandong/How_to_optimize_in_GPU

tmbdev/clstm

JDAI-CV/dabnn

xboot/libonnx

ONNC/onnc

MegEngine/MegCC

xylcbd/EasyCNN

YellowOldOdd/SDBI

HuiiJi/ez_ISP

MartinChan3/ClipperDocCN

chncwang/InsNet

OpenGPGPU/opengpgpu

ArtyZe/yolo_quantization

ONNC/onnc-tutorial

Zhengtq/ncnn_breakdown

graphcore/llvm-project-fork

graphcore/popart