Wenzha0Wu's Stars
jeffhammond/STREAM
STREAM benchmark
jameslinsjtu/swCandle
The micro-benchmark suite to evaluate the micro-architecture of China's home-grown many-core processor SW26010
hngenc/systolic-array
A DSL for Systolic Arrays
Xilinx/inference-server
gabime/spdlog
Fast C++ logging library.
microsoft/MMdnn
MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.
ceccocats/tkDNN
Deep neural network library and toolkit to do high performace inference on NVIDIA jetson platforms
mlcommons/inference_policies
Issues related to MLPerf™ Inference policies, including rules and suggested changes
mlcommons/inference
Reference implementations of MLPerf™ inference benchmarks
mlcommons/inference_results_v3.0
This repository contains the results and code for the MLPerf™ Inference v3.0 benchmark.
NVlabs/tiny-cuda-nn
Lightning fast C++/CUDA neural network framework
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
nomic-ai/gpt4all
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
dmlc/xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
andersy005/tvm-in-action
TVM stack: exploring the incredible explosion of deep-learning frameworks and how to bring them together
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
cornell-zhang/heterocl
HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing
arcsysu/SYsU-lang
A mini, simple and modular compiler for SYsU/SysY(tiny C). Based on Clang/LLVM/ANTLR4/Bison/Flex.
bytedance/byteir
A model compilation solution for various hardware
amazon-science/FeatGraph
cloneofsimo/lora
Using Low-rank adaptation to quickly fine-tune diffusion models.
tlc-pack/relax
MegEngine/MegEngine
MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架
Xilinx/Vitis-AI
Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
buaa-hipo/dlcompiler-comparison
The quantitative performance comparison among DL compilers on CNN models.
triton-lang/triton
Development repository for the Triton language and compiler
donnemartin/system-design-primer
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
aalhour/awesome-compilers
:sunglasses: Curated list of awesome resources on Compilers, Interpreters and Runtimes
google/benchmark
A microbenchmark support library
tensorflow/runtime
A performant and modular runtime for TensorFlow