Wenzha0Wu

Wenzha0Wu's Stars

jeffhammond/STREAM
STREAM benchmark
Language:C355138
jameslinsjtu/swCandle
The micro-benchmark suite to evaluate the micro-architecture of China's home-grown many-core processor SW26010
Language:Assembly55
hngenc/systolic-array
A DSL for Systolic Arrays
Language:Scala7813
Xilinx/inference-server
Language:C++4313
gabime/spdlog
Fast C++ logging library.
Language:C++24.7k4.6k
microsoft/MMdnn
MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.
Language:Python5.8k966
ceccocats/tkDNN
Deep neural network library and toolkit to do high performace inference on NVIDIA jetson platforms
Language:C++719209
mlcommons/inference_policies
Issues related to MLPerf™ Inference policies, including rules and suggested changes
5752
mlcommons/inference
Reference implementations of MLPerf™ inference benchmarks
Language:Python1.3k538
mlcommons/inference_results_v3.0
This repository contains the results and code for the MLPerf™ Inference v3.0 benchmark.
1915
NVlabs/tiny-cuda-nn
Lightning fast C++/CUDA neural network framework
Language:C++3.8k462
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++5.8k1k
nomic-ai/gpt4all
GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.
Language:C++71.1k7.7k
dmlc/xgboost
Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Dask, Flink and DataFlow
Language:C++26.4k8.7k
andersy005/tvm-in-action
TVM stack: exploring the incredible explosion of deep-learning frameworks and how to bring them together
Language:Jupyter Notebook647
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
Language:C++5.9k895
cornell-zhang/heterocl
HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing
Language:Python32693
arcsysu/SYsU-lang
A mini, simple and modular compiler for SYsU/SysY(tiny C). Based on Clang/LLVM/ANTLR4/Bison/Flex.
Language:C20938
bytedance/byteir
A model compilation solution for various hardware
Language:MLIR38743
amazon-science/FeatGraph
Language:Python729
cloneofsimo/lora
Using Low-rank adaptation to quickly fine-tune diffusion models.
Language:Jupyter Notebook7.1k484
tlc-pack/relax
Language:Python19658
MegEngine/MegEngine
MegEngine 是一个快速、可拓展、易于使用且支持自动求导的深度学习框架
Language:C++4.8k543
Xilinx/Vitis-AI
Vitis AI is Xilinx’s development stack for AI inference on Xilinx hardware platforms, including both edge devices and Alveo cards.
Language:Python1.5k637
buaa-hipo/dlcompiler-comparison
The quantitative performance comparison among DL compilers on CNN models.
Language:C++756
triton-lang/triton
Development repository for the Triton language and compiler
Language:C++13.7k1.7k
donnemartin/system-design-primer
Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards.
Language:Python279k46.7k
aalhour/awesome-compilers
:sunglasses: Curated list of awesome resources on Compilers, Interpreters and Runtimes
9k651
google/benchmark
A microbenchmark support library
Language:C++9.1k1.6k
tensorflow/runtime
A performant and modular runtime for TensorFlow
Language:C++760123