yzh119

who I am is where I stand, where I stand is where I fall

@flashinfer-aiSeattle, WA

Pinned Repositories

tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Language:Python11.9k 377 3.4k3.5k
dgl
Python package built to ease deep learning on graph, on top of existing DL frameworks.
Language:Python13.6k 177 2.8k3k
flashinfer
FlashInfer: Kernel Library for LLM Serving
Language:Cuda1.6k 21 148160
mlc-llm
Universal LLM Deployment Engine with ML Compilation
Language:Python19.4k 177 1.4k1.6k
punica
Serving multiple LoRA finetuned LLM as one
Language:Python1k 12 3946
SparseTIR
SparseTIR: Sparse Tensor Compiler for Deep Learning
Language:Python133 6 7114
BPT
Source code of paper "BP-Transformer: Modelling Long-Range Context via Binary Partitioning"
Language:Python126 6 420
language-grounding-experiments
To do experiments on language grounding.
Language:Jupyter Notebook11 5 01
segtree-transformer-v0
Code for SegTree Transformer (ICLR-RLGM 2019).
Language:Python27 4 01

yzh119's Repositories

yzh119/bibfetch
Fetch bibtex entries from academic search engines like dblp.
Language:Python3 2 00
yzh119/mirage
A multi-level tensor algebra superoptimizer
Language:C++2 0 0
yzh119/punica
Serving multiple LoRA finetuned LLM as one
Language:Python2 1 0
yzh119/mlc-llm
Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.
Language:Python1 1 0
yzh119/relax
Temp repo for prototyping relax(relay next), the effort will be upstreamed. We use the wiki pages on this repo to host design docs.
Language:Python1 1 0
yzh119/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++2 0
yzh119/dgsparse
Language:Cuda1 01
yzh119/flashinfer-ai.github.io
Project website of FlashInfer project
Language:HTML1 0
yzh119/flashinfer-dev
FlashInfer: Kernel Library for LLM Serving
Language:Cuda0 0
yzh119/kernels
Language:Python0 0
yzh119/llm-perf-bench
Language:Shell1 0
yzh119/metal-benchmarks
Apple GPU microarchitecture
Language:Metal1 0
yzh119/mlx
MLX: An array framework for Apple silicon
Language:C++1 0
yzh119/nccl
Optimized primitives for collective multi-GPU communication
Language:C++1 0
yzh119/NetHack
Official NetHack Git Repository
yzh119/open-gpu-kernel-modules
NVIDIA Linux open GPU kernel module source
Language:C1 0
yzh119/pbrt-v4
Source code to pbrt, the ray tracer described in the forthcoming 4th edition of the "Physically Based Rendering: From Theory to Implementation" book.
Language:C++0 0
yzh119/relax-sparse
Temp repo for prototyping relax(relay next), the effort will be upstreamed. We use the wiki pages on this repo to host design docs.
Language:Python1 0
yzh119/sglang
SGLang is a fast serving framework for large language models and vision language models.
Language:Python0 0
yzh119/smoothquant
SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models
Language:Python1 0
yzh119/texmacs
Source Code of GNU TeXmacs, Developers Guide ==>
Language:Tcl0 0
yzh119/tlcpack
Language:Groovy2 0
yzh119/triton
Development repository for the Triton language and compiler
Language:C++1 0
yzh119/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Language:Python4 0
yzh119/tvm-rfcs
A home for the final text of all TVM RFCs.
1 0
yzh119/utils
Language:Python1 0
yzh119/uwsampl.github.io
The UW SAMPL group's website.
Language:HTML2 0
yzh119/web-data
2 0
yzh119/web-llm
Bringing large-language models and chat to web browsers. Everything runs inside the browser with no server support.
Language:Python1 0
yzh119/web-stable-diffusion
Bringing stable diffusion models to web browsers. Everything runs inside the browser with no server support.
Language:Jupyter Notebook1 0