Pinned Repositories
tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
AICS-Course
《智能计算系统 AI Computing Systems》习题答案、实验答案、课程笔记
AutoGPTQ.tvm
GPTQ inference TVM kernel
DigitalAlarmClock
njtech digital design. a fpga digital alarm system with Nexys A7 100T
FPGA
帮助大家进行FPGA的入门,分享FPGA相关的优秀文章,优秀项目
tvm_gpu_gemm
play gemm with tvm
VehicleFlowDetection
Implement of vehicle flow statistics based on tensorflow and yolo3 with pyqt5 GUI.
ZYNQ-NVDLA
NVDLA (An Opensource DL Accelerator Framework) implementation on FPGA.
nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
Tengine
Tengine is a lite, high performance, modular inference engine for embedded device
LeiWang1999's Repositories
LeiWang1999/ZYNQ-NVDLA
NVDLA (An Opensource DL Accelerator Framework) implementation on FPGA.
LeiWang1999/AICS-Course
《智能计算系统 AI Computing Systems》习题答案、实验答案、课程笔记
LeiWang1999/tvm_gpu_gemm
play gemm with tvm
LeiWang1999/AutoGPTQ.tvm
GPTQ inference TVM kernel
LeiWang1999/VehicleFlowDetection
Implement of vehicle flow statistics based on tensorflow and yolo3 with pyqt5 GUI.
LeiWang1999/HPC-Course
LeiWang1999/leiblog.wang
My New Blog Powered by HEXO http://leiblog.wang
LeiWang1999/rocblas-benchmark
LeiWang1999/LeiBlog
用Vuetify.js+Vue.js+Node.js(KOA 自己撸一个博客。http://leiblog.wang
LeiWang1999/cv
resume.
LeiWang1999/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
LeiWang1999/compiler-and-arch
A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture
LeiWang1999/cutlass
LeiWang1999/mlc-benchmark
LeiWang1999/_cutlass
CUDA Templates for Linear Algebra Subroutines
LeiWang1999/nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
LeiWang1999/antares
Antares: an automatic engine for multi-platform kernel generation and optimization. Supporting CPU, CUDA, ROCm, DirectX12, GraphCore, SYCL for CPU/GPU, OpenCL for AMD/NVIDIA, Android CPU/GPU backends.
LeiWang1999/apex
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
LeiWang1999/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
LeiWang1999/AutoGPTQ_nf
LeiWang1999/ComputeShaderPlayground
Compute Shader Playground with DirectX12
LeiWang1999/gptq_faster
Faster 3bit CUDA Kernel for gptq.
LeiWang1999/LeiWang1999
LeiWang1999/mlc-llm
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
LeiWang1999/nmsparse
LeiWang1999/nni
An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
LeiWang1999/ppq
PPL Quantization Tool (PPQ) is a powerful offline neural network quantization tool.
LeiWang1999/relax
LeiWang1999/vllm-bitblas
A high-throughput and memory-efficient inference and serving engine for LLMs
LeiWang1999/Welder_artifacts
OSDI 2023 WElder artifacts