Pinned Repositories
tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
flash-attention
Fast and memory-efficient exact attention
blislab
BLISlab: A Sandbox for Optimizing GEMM
how-to-optimize-gemm
crawler
Baidu training
HowToOptimizeGemm
LAFF
Learn the theory of linear algebra hand-in-hand with the practice of software library development.
Tsinghua_Data_Center
sed
FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
jianyuh's Repositories
jianyuh/HowToOptimizeGemm
jianyuh/about
jianyuh/asmjit
Complete x86/x64 JIT and AOT Assembler for C++
jianyuh/blislab
BLISlab: A Sandbox for Optimizing GEMM
jianyuh/cpu_gpu_profiling
jianyuh/CS378_PfCandP
CS378 Programming for Correctness and Performance
jianyuh/cutlass
CUDA Templates for Linear Algebra Subroutines
jianyuh/effectivepython
Effective Python: Second Edition — Source Code and Errata for the Book
jianyuh/FBGEMM
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
jianyuh/flash-attention
Fast and memory-efficient exact attention
jianyuh/friendLunarBirthday
generate "csv" format data of my friends' Chinese lunar Birthday for Google Calendar import
jianyuh/glow
Compiler for Neural Network hardware accelerators
jianyuh/hub
jianyuh/jianyuh
jianyuh/llama-toolchain
Model components of the Llama Stack APIs
jianyuh/me
about page for my personal website
jianyuh/param
PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.
jianyuh/pytext
A natural language modeling framework based on PyTorch
jianyuh/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
jianyuh/sparse-ads-baselines
jianyuh/sudonohup.github.com
jianyuh/tblis
TBLIS is a library and framework for performing tensor operations, especially tensor contraction, using efficient native algorithms.
jianyuh/tblis-strassen
jianyuh/torchrec-1
Pytorch domain library for recommendation systems
jianyuh/torchrec-3
Pytorch domain library for recommendation systems
jianyuh/TransformerEngine
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper GPUs, to provide better performance with lower memory utilization in both training and inference.
jianyuh/tutorials
PyTorch tutorials.
jianyuh/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
jianyuh/vimbackup
vim plugin backup
jianyuh/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.