Pinned Repositories
Android
benchmarks
bitwise_spgemm
ccf-deadlines
⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~
CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully :)
cuda_hook
Hooked CUDA-related dynamic libraries by using automated code generation tools.
datamining
learn in datamining
PAME
Early Exits of DNN Networks with TensorRT
x-transformers
A simple but complete full-attention transformer with a set of promising experimental features from various papers
xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
ZSL98's Repositories
ZSL98/PAME
Early Exits of DNN Networks with TensorRT
ZSL98/x-transformers
A simple but complete full-attention transformer with a set of promising experimental features from various papers
ZSL98/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
ZSL98/benchmarks
ZSL98/bitwise_spgemm
ZSL98/ccf-deadlines
⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~
ZSL98/CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully :)
ZSL98/cuda_hook
Hooked CUDA-related dynamic libraries by using automated code generation tools.
ZSL98/DeepLearningExamples
Deep Learning Examples
ZSL98/Dubhe-proof
ZSL98/diffusion_policy
[RSS 2023] Diffusion Policy Visuomotor Policy Learning via Action Diffusion
ZSL98/fastmoe
A fast MoE impl for PyTorch
ZSL98/FedML
A Research-oriented Federated Learning Library. Supporting distributed computing, mobile/IoT on-device training, and standalone simulation. A short version of our white paper has been accepted by NeurIPS 2020 workshop.
ZSL98/gdev
First-Class GPU Resource Management: Device Drivers, Runtimes, and CUDA Compilers for Nouveau.
ZSL98/leaf
Leaf: A Benchmark for Federated Settings
ZSL98/Megatron-LM
Ongoing research training transformer models at scale
ZSL98/nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
ZSL98/nvsci
Linux kernel modules for secure sharing of memory buffers
ZSL98/orion
An interference-aware scheduler for fine-grained GPU sharing
ZSL98/Shallow-Deep-Networks
Source Code for ICML 2019 Paper "Shallow-Deep Networks: Understanding and Mitigating Network Overthinking"
ZSL98/Swin-Transformer
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
ZSL98/TBsche
ZSL98/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
ZSL98/TGS
Artifacts for our NSDI'23 paper TGS
ZSL98/Train-Nvidia
ZSL98/tutel
Tutel MoE: An Optimized Mixture-of-Experts Implementation
ZSL98/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
ZSL98/verl
veRL: Volcano Engine Reinforcement Learning for LLM
ZSL98/website
ZSL98/zsl98.github.io
Shulai Zhang's Homepage