jianyuh

Beat the speed of light.

Pinned Repositories

tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Language:Python11.6k 377 3.4k3.4k
flash-attention
Fast and memory-efficient exact attention
Language:Python13.6k 115 1k1.2k
blislab
BLISlab: A Sandbox for Optimizing GEMM
Language:C469 16 199
how-to-optimize-gemm
Language:C1.7k 44 18353
crawler
Baidu training
Language:C++3 2 01
HowToOptimizeGemm
Language:C1 3 00
LAFF
Learn the theory of linear algebra hand-in-hand with the practice of software library development.
Language:JavaScript1 2 00
Tsinghua_Data_Center
sed
Language:Python4 2 00
FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Language:C++1.2k 66 165479
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python82.5k 1.7k 45.3k22.2k

jianyuh's Repositories

jianyuh/HowToOptimizeGemm
Language:C1 3 00
jianyuh/about
Language:HTML2 0
jianyuh/asmjit
Complete x86/x64 JIT and AOT Assembler for C++
Language:C++2 0
jianyuh/blislab
BLISlab: A Sandbox for Optimizing GEMM
Language:C2 0
jianyuh/cpu_gpu_profiling
Language:C2 0
jianyuh/CS378_PfCandP
CS378 Programming for Correctness and Performance
Language:C3 0
jianyuh/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++1 0
jianyuh/effectivepython
Effective Python: Second Edition — Source Code and Errata for the Book
Language:Python2 0
jianyuh/FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
Language:C++3 0
jianyuh/flash-attention
Fast and memory-efficient exact attention
Language:Python1 0
jianyuh/friendLunarBirthday
generate "csv" format data of my friends' Chinese lunar Birthday for Google Calendar import
Language:Python2 0
jianyuh/glow
Compiler for Neural Network hardware accelerators
Language:C++2 0
jianyuh/hub
Language:Python2 0
jianyuh/jianyuh
jianyuh/llama-toolchain
Model components of the Llama Stack APIs
jianyuh/me
about page for my personal website
Language:CSS2 0
jianyuh/param
PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.
jianyuh/pytext
A natural language modeling framework based on PyTorch
Language:Python2 0
jianyuh/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:C++3 0
jianyuh/sparse-ads-baselines
Language:Python2 0
jianyuh/sudonohup.github.com
Language:HTML2 01
jianyuh/tblis
TBLIS is a library and framework for performing tensor operations, especially tensor contraction, using efficient native algorithms.
Language:C3 0
jianyuh/tblis-strassen
3 0
jianyuh/torchrec-1
Pytorch domain library for recommendation systems
Language:Python2 0
jianyuh/torchrec-3
Pytorch domain library for recommendation systems
Language:Python1 0
jianyuh/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.
Language:Python1 0
jianyuh/tutorials
PyTorch tutorials.
Language:Jupyter Notebook2 0
jianyuh/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Language:Python2 0
jianyuh/vimbackup
vim plugin backup
Language:VimL2 0
jianyuh/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
Language:Python1 0