leevan

Pinned Repositories

AdaTune
This is the implementation for paper: AdaTune: Adaptive Tensor Program CompilationMade Efficient (NeurIPS 2020).
Language:Python0 0 00
algorithms-sedgewick-wayne
Solutions to all the exercises of the Algorithms book by Robert Sedgewick and Kevin Wayne
Language:Java0 0 00
ArchBenchSuite
low level kernels to benchmark peak compute, cache bandwidth on various levels, memory bandwidth, and some basic compute routines
Language:C++0 0 00
assignment2-2018
(Spring 2018) Assignment 2: Graph Executor with TVM
Language:Python0 0 00
awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
00
BERT4Rec
BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer
Language:Python0 0 00
blocking-tutorial
Language:C++00
keras-mmoe
A Keras implementation of "Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts" (KDD 2018)
Language:Python1 0 00

leevan's Repositories

leevan/AdaTune
This is the implementation for paper: AdaTune: Adaptive Tensor Program CompilationMade Efficient (NeurIPS 2020).
Language:Python0 0 00
leevan/algorithms-sedgewick-wayne
Solutions to all the exercises of the Algorithms book by Robert Sedgewick and Kevin Wayne
Language:Java0 0 00
leevan/ArchBenchSuite
low level kernels to benchmark peak compute, cache bandwidth on various levels, memory bandwidth, and some basic compute routines
Language:C++0 0 00
leevan/cuda_hgemm
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
leevan/DCGM
NVIDIA Data Center GPU Manager (DCGM) is a project for gathering telemetry and measuring the health of NVIDIA GPUs
leevan/dcgm-exporter
NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
leevan/effective_transformer
Running BERT without Padding
Language:C++0 0
leevan/generative-models
Generative Models by Stability AI
leevan/gpu-benches
collection of benchmarks to measure basic GPU capabilities
leevan/Habana_Custom_Kernel
Provides the examples to write and build Habana custom kernels using the HabanaTools
leevan/incubator-tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
leevan/ktransformers
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
leevan/lectures
Material for cuda-mode lectures
leevan/leetcode_101
LeetCode 101：和你一起你轻松刷题（C++）
0 0
leevan/llm.c
LLM training in simple, raw C/CUDA
Language:Cuda0 0
leevan/maxas
Assembler for NVIDIA Maxwell architecture
leevan/mixbench
A GPU benchmark tool for evaluating GPUs and CPUs on mixed operational intensity kernels (CUDA, OpenCL, HIP, SYCL, OpenMP)
Language:C++0 0
leevan/ml-engineering
Machine Learning Engineering Open Book
leevan/mlir-examples
a simple end to end example of taking a ML graph (TF2 / PyTorch) and running it on a device [cpu, gpu]
Language:MLIR0 0
leevan/mmperf
MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.
leevan/onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
leevan/oppo_reco
leevan/scripts
Language:Python
leevan/sdxl_docker
leevan/SLIDE_opt_ia
Language:C++0 0
leevan/StableSR
Exploiting Diffusion Prior for Real-World Image Super-Resolution
Language:Python0 01
leevan/tensorflow
An Open Source Machine Learning Framework for Everyone
leevan/tf_op_graph
A visualization tool to display TF-Grappler optimized op graph
leevan/TLCBench
Benchmark scripts for TVM
leevan/transformers-code
手把手带你实战 Huggingface Transformers 课程视频同步更新在B站与YouTube
Language:Jupyter Notebook0 0