xxzh12

xxzh12's Stars

PyHDI/Pyverilog
Python-based Hardware Design Processing Toolkit for Verilog HDL
Language:Python593167
ekiwi/open-source-formal-verification-for-chisel
Language:Jupyter Notebook111
tdb-alcorn/chisel-formal
Language:Scala233
chipsalliance/treadle
Chisel/Firrtl execution engine
Language:Scala15031
pku-liang/ksim
Language:C++263
cucapra/EventQueue
EQueue Dialect
Language:MLIR357
soDLA-publishment/soDLA
Chisel implementation of the NVIDIA Deep Learning Accelerator (NVDLA), with self-driving accelerated
Language:Scala21447
Accelergy-Project/accelergy-timeloop-infrastructure
Linux docker for the DNN accelerator exploration infrastructure composed of Accelergy and Timeloop
Language:Dockerfile3825
NVlabs/timeloop
Timeloop performs modeling, mapping and code-generation for tensor algebra workloads on various accelerator architectures.
Language:C++30399
ucb-bar/dsptools
A Library of Chisel3 Tools for Digital Signal Processing
Language:Scala21538
chipsalliance/chisel
Chisel: A Modern Hardware Design Language
Language:Scala3.8k575
chipsalliance/firrtl
Flexible Intermediate Representation for RTL
Language:Scala704175
AutoGPTQ/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
Language:Python4.1k418
rasbt/dora-from-scratch
LoRA and DoRA from Scratch Implementations
Language:Jupyter Notebook16611
llvm/llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
Language:LLVM26.8k11k
facebookresearch/bit
Code repo for the paper BiT Robustly Binarized Multi-distilled Transformer
Language:Python9310
NVIDIA/CUDALibrarySamples
CUDA Library Samples
Language:Cuda1.4k291
google/minimalloc
A lightweight memory allocator for hardware-accelerated machine learning
Language:C++1057
tlc-pack/cutlass_fpA_intB_gemm
A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
Language:C++7920
FMInference/FlexGen
Running large language models on a single GPU for throughput-oriented scenarios.
Language:Python9.1k528
itlab-vision/opencv-samples-perf-analysis
Language:Shell24
ggerganov/llama.cpp
LLM inference in C/C++
Language:C++61.2k8.7k
chengzeyi/stable-fast
Best inference performance optimization framework for HuggingFace Diffusers on NVIDIA GPUs.
Language:Python1k59
NVIDIA/nvbench
CUDA Kernel Benchmarking Library
Language:Cuda44461
triton-lang/triton
Development repository for the Triton language and compiler
Language:C++11.9k1.4k
KULeuven-MICAS/zigzag
HW Architecture-Mapping Design Space Exploration Framework for Deep Learning Accelerators
Language:C++9031
buddy-compiler/buddy-mlir
An MLIR-based compiler framework bridges DSLs (domain-specific languages) to DSAs (domain-specific architectures).
Language:C++440146
mit-han-lab/inter-operator-scheduler
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
Language:C++18930
DNN-Accelerators/Open-Source-IPs
Language:C++3214
snuspl/nimble
Lightweight and Parallel Deep Learning Framework
Language:C++25833