Pinned Repositories
AdaTune
This is the implementation for paper: AdaTune: Adaptive Tensor Program CompilationMade Efficient (NeurIPS 2020).
algorithms-sedgewick-wayne
Solutions to all the exercises of the Algorithms book by Robert Sedgewick and Kevin Wayne
ArchBenchSuite
low level kernels to benchmark peak compute, cache bandwidth on various levels, memory bandwidth, and some basic compute routines
assignment2-2018
(Spring 2018) Assignment 2: Graph Executor with TVM
awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
BERT4Rec
BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer
blocking-tutorial
keras-mmoe
A Keras implementation of "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts" (KDD 2018)
leevan's Repositories
leevan/AdaTune
This is the implementation for paper: AdaTune: Adaptive Tensor Program CompilationMade Efficient (NeurIPS 2020).
leevan/algorithms-sedgewick-wayne
Solutions to all the exercises of the Algorithms book by Robert Sedgewick and Kevin Wayne
leevan/ArchBenchSuite
low level kernels to benchmark peak compute, cache bandwidth on various levels, memory bandwidth, and some basic compute routines
leevan/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
leevan/DCGM
NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
leevan/dcgm-exporter
NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
leevan/effective_transformer
Running BERT without Padding
leevan/generative-models
Generative Models by Stability AI
leevan/gpu-benches
collection of benchmarks to measure basic GPU capabilities
leevan/Habana_Custom_Kernel
Provides the examples to write and build Habana custom kernels using the HabanaTools
leevan/incubator-tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
leevan/ktransformers
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
leevan/lectures
Material for cuda-mode lectures
leevan/leetcode_101
LeetCode 101:和你一起你轻松刷题(C++)
leevan/llm.c
LLM training in simple, raw C/CUDA
leevan/maxas
Assembler for NVIDIA Maxwell architecture
leevan/mixbench
A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)
leevan/ml-engineering
Machine Learning Engineering Open Book
leevan/mlir-examples
a simple end to end example of taking a ML graph (TF2 / PyTorch) and running it on a device [cpu, gpu]
leevan/mmperf
MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.
leevan/onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
leevan/oppo_reco
leevan/scripts
leevan/sdxl_docker
leevan/SLIDE_opt_ia
leevan/StableSR
Exploiting Diffusion Prior for Real-World Image Super-Resolution
leevan/tensorflow
An Open Source Machine Learning Framework for Everyone
leevan/tf_op_graph
A visualization tool to display TF-Grappler optimized op graph
leevan/TLCBench
Benchmark scripts for TVM
leevan/transformers-code
手把手带你实战 Huggingface Transformers 课程视频同步更新在B站与YouTube