xxzh12's Stars
PyHDI/Pyverilog
Python-based Hardware Design Processing Toolkit for Verilog HDL
ekiwi/open-source-formal-verification-for-chisel
tdb-alcorn/chisel-formal
chipsalliance/treadle
Chisel/Firrtl execution engine
pku-liang/ksim
cucapra/EventQueue
EQueue Dialect
soDLA-publishment/soDLA
Chisel implementation of the NVIDIA Deep Learning Accelerator (NVDLA), with self-driving accelerated
Accelergy-Project/accelergy-timeloop-infrastructure
Linux docker for the DNN accelerator exploration infrastructure composed of Accelergy and Timeloop
NVlabs/timeloop
Timeloop performs modeling, mapping and code-generation for tensor algebra workloads on various accelerator architectures.
ucb-bar/dsptools
A Library of Chisel3 Tools for Digital Signal Processing
chipsalliance/chisel
Chisel: A Modern Hardware Design Language
chipsalliance/firrtl
Flexible Intermediate Representation for RTL
AutoGPTQ/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
rasbt/dora-from-scratch
LoRA and DoRA from Scratch Implementations
llvm/llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
facebookresearch/bit
Code repo for the paper BiT Robustly Binarized Multi-distilled Transformer
NVIDIA/CUDALibrarySamples
CUDA Library Samples
google/minimalloc
A lightweight memory allocator for hardware-accelerated machine learning
tlc-pack/cutlass_fpA_intB_gemm
A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
FMInference/FlexGen
Running large language models on a single GPU for throughput-oriented scenarios.
itlab-vision/opencv-samples-perf-analysis
ggerganov/llama.cpp
LLM inference in C/C++
chengzeyi/stable-fast
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
NVIDIA/nvbench
CUDA Kernel Benchmarking Library
triton-lang/triton
Development repository for the Triton language and compiler
KULeuven-MICAS/zigzag
HW Architecture-Mapping Design Space Exploration Framework for Deep Learning Accelerators
buddy-compiler/buddy-mlir
An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).
mit-han-lab/inter-operator-scheduler
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
DNN-Accelerators/Open-Source-IPs
snuspl/nimble
Lightweight and Parallel Deep Learning Framework