Pinned Repositories
tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
AICS-Course
《智能计算系统 AI Computing Systems》习题答案、实验答案、课程笔记
AutoGPTQ.tvm
GPTQ inference TVM kernel
DigitalAlarmClock
njtech digital design. a fpga digital alarm system with Nexys A7 100T
FPGA
帮助大家进行FPGA的入门,分享FPGA相关的优秀文章,优秀项目
tvm_gpu_gemm
play gemm with tvm
VehicleFlowDetection
Implement of vehicle flow statistics based on tensorflow and yolo3 with pyqt5 GUI.
ZYNQ-NVDLA
NVDLA (An Opensource DL Accelerator Framework) implementation on FPGA.
BitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
LeiWang1999's Repositories
LeiWang1999/ZYNQ-NVDLA
NVDLA (An Opensource DL Accelerator Framework) implementation on FPGA.
LeiWang1999/tvm_gpu_gemm
play gemm with tvm
LeiWang1999/AutoGPTQ.tvm
GPTQ inference TVM kernel
LeiWang1999/VehicleFlowDetection
Implement of vehicle flow statistics based on tensorflow and yolo3 with pyqt5 GUI.
LeiWang1999/leiblog.wang
My New Blog Powered by HEXO http://leiblog.wang
LeiWang1999/rocblas-benchmark
LeiWang1999/BitBLAS
LeiWang1999/memfusion_artifact
LeiWang1999/cv
resume.
LeiWang1999/mlc-benchmark
LeiWang1999/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
LeiWang1999/cutlass
LeiWang1999/Ladder
@DataStructures_Cbased I'm Coming!
LeiWang1999/Roller
Build and Train AlexNet with PyTorch and Predict with TVM and Pytorch, compare the performance between them
LeiWang1999/_cutlass
CUDA Templates for Linear Algebra Subroutines
LeiWang1999/MSBitBLAS
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
LeiWang1999/nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
LeiWang1999/vLLM
LeiWang1999/apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
LeiWang1999/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
LeiWang1999/AutoGPTQ_nf
LeiWang1999/gptq_faster
Faster 3bit CUDA Kernel for gptq.
LeiWang1999/LeiWang1999
LeiWang1999/mlc-llm
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
LeiWang1999/nmsparse
LeiWang1999/nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
LeiWang1999/ppq
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
LeiWang1999/relax
LeiWang1999/vllm-bitblas
A high-throughput and memory-efficient inference and serving engine for LLMs
LeiWang1999/Welder_artifacts
OSDI 2023 WElder artifacts