Pinned Repositories
alphatensor
awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
FeatherCNN
FeatherCNN is a high performance inference engine for convolutional neural networks.
Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
gluon-cv
Gluon CV Toolkit
hipBLAS
ROCm BLAS marshalling library
HowToCook
程序员在家做饭方法指南。Programmer's guide about how to cook at home (Chinese only).
LibShalom
MYLIB
TPDS
AnonymousYWL's Repositories
AnonymousYWL/LibShalom
AnonymousYWL/MYLIB
AnonymousYWL/TPDS
AnonymousYWL/alphatensor
AnonymousYWL/awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
AnonymousYWL/FeatherCNN
FeatherCNN is a high performance inference engine for convolutional neural networks.
AnonymousYWL/Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
AnonymousYWL/gluon-cv
Gluon CV Toolkit
AnonymousYWL/hipBLAS
ROCm BLAS marshalling library
AnonymousYWL/HowToCook
程序员在家做饭方法指南。Programmer's guide about how to cook at home (Chinese only).
AnonymousYWL/incubator-mxnet
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
AnonymousYWL/lingweiyang.github.io
AnonymousYWL/intel-extension-for-transformers
⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡
AnonymousYWL/LLM-Viewer
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
AnonymousYWL/models
A collection of pre-trained, state-of-the-art models in the ONNX format
AnonymousYWL/oneDNN
oneAPI Deep Neural Network Library (oneDNN)
AnonymousYWL/Optimizing-DGEMM-on-Intel-CPUs-with-AVX512F
Stepwise optimizations of DGEMM on CPU, reaching performance faster than Intel MKL eventually.
AnonymousYWL/rankfm
Factorization Machines for Recommendation and Ranking Problems with Implicit Feedback Data
AnonymousYWL/SMM
AnonymousYWL/sparse-register-tiling
AnonymousYWL/STM-Multifrontal-QR-Factorization-Empowered-by-GCN
AnonymousYWL/TileSpGEMM
Source code of the PPoPP '22 paper: "TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs" by Yuyao Niu, Zhengyang Lu, Haonan Ji, Shuhui Song, Zhou Jin, and Weifeng Liu.
AnonymousYWL/tsm2x-imp
Implementation of TSM2L and TSM2R -- High-Performance Tall-and-Skinny Matrix-Matrix Multiplication Algorithms for CUDA