ybai62868
Design new abstractions for tensor computation on hardware to facilitate productivity and performance.
CUHKHong Kong
Pinned Repositories
AutoBench4TensorComputation
automatic benchmark of tensor program generation and optimization
CUDA-tutorial
This is a repo for my training cuda code.
gluon-tutorial
Gluon Tutorial for Deep Learning Researchers && Engineers.
heterocl
HeteroCL: A Multi-Paradigm Programming Infrastructure for Software-Defined Heterogeneous Computing
How-to-Optim-Algo-with-Triton
Optimize the important algorithm with Triton
OpenCL_Xilinx-Intel_HeteroCL
This is a repo which contains some details about how to use OpenCL backend (Xilinx/Intel).
Posetrack_baseline_pytorch
This is a project which contains all of modules used in Posetrack and I will write a tutorial to teach everyone who knows little about deep learning and computer vision to construct an entire PoseTrack system.
programmable-accelerator-design
Research papers related to accelerator design and tensor program optimization with compiler.
STSN-Object-Detection-in-Video-with-Spatiotemporal-Sampling-Networks
This is my re-implementation for the paper STSN.
triton-mlir
Development repository for the Triton language and compiler
ybai62868's Repositories
ybai62868/How-to-Optim-Algo-with-Triton
Optimize the important algorithm with Triton
ybai62868/AutoBench4TensorComputation
automatic benchmark of tensor program generation and optimization
ybai62868/gemm-benchmark
Python based gemm benchmark for tensor computation on NV & AMD GPUs
ybai62868/triton-mlir
Development repository for the Triton language and compiler
ybai62868/awesome-compilation-spatial-accelerators
ybai62868/efficient-lora
research work about parameter efficient fine-tuning
ybai62868/nlp-llm-compiler-paper
This is a repo for the paper related to NLP & Compiler & LLM.
ybai62868/ybai62868.github.io
ybai62868/programmable-accelerator-design
Research papers related to accelerator design and tensor program optimization with compiler.
ybai62868/RISC-V-Custom-Extension
RISC-V Extension with MLIR Dialect
ybai62868/awesome-machine-learning-in-compilers
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
ybai62868/Awesome-Pruning
A curated list of neural network pruning resources.
ybai62868/buddy-mlir-1
An MLIR-Based Ideas Landing Project
ybai62868/cutlass_performance_profiling
Exploration of GEMM Performance Improvement with CUTLASS
ybai62868/hidet-artifacts
This repository is the artifact of paper "Hidet: Task Mapping Programming Paradigm for Deep Learning Tensor Programs".
ybai62868/lanyon
A content-first, sliding sidebar theme for Jekyll.
ybai62868/llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github pull requests at this moment. Please submit your patches at http://reviews.llvm.org.
ybai62868/one-yolov5
A more efficient yolov5 with oneflow backend 🎉🎉🎉
ybai62868/onnx-simplifier
Simplify your onnx model
ybai62868/opencv-samples-perf-analysis
ybai62868/Tech_Blog
This is a personal technical blog to descripe how to become a full-stack hacker with PyTorch, MLIR, RISC-V and Spatial Accelerators.
ybai62868/ToMe
A method to increase the speed and lower the memory footprint of existing vision transformers.
ybai62868/triton-dev
Development repository for the Triton language and compiler
ybai62868/triton-mlir-benchmark
ybai62868/triton-shared
Shared Middle-Layer for Triton Compilation
ybai62868/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
ybai62868/TVM-Demo
ybai62868/tvm_op_fusion_tuning
ybai62868/ybai62868
Config files for my GitHub profile.
ybai62868/yolov5
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite