ZSL98

Pinned Repositories

Android
0 1 00
benchmarks
Language:Python00
bitwise_spgemm
Language:Cuda0 1 00
ccf-deadlines
⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~
Language:Vue0 0 00
CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
Language:Python0 0 00
cuda_hook
Hooked CUDA-related dynamic libraries by using automated code generation tools.
Language:C00
datamining
learn in datamining
Language:Python0 1 00
PAME
Early Exits of DNN Networks with TensorRT
Language:Python2 2 01
x-transformers
A simple but complete full-attention transformer with a set of promising experimental features from various papers
Language:Python1 0 01
xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
Language:Python1 0 01

ZSL98's Repositories

ZSL98/PAME
Early Exits of DNN Networks with TensorRT
Language:Python2 2 01
ZSL98/x-transformers
A simple but complete full-attention transformer with a set of promising experimental features from various papers
Language:Python1 0 01
ZSL98/xformers
Hackable and optimized Transformers building blocks, supporting a composable construction.
Language:Python1 0 01
ZSL98/benchmarks
Language:Python00
ZSL98/bitwise_spgemm
Language:Cuda0 1 00
ZSL98/ccf-deadlines
⏰ Collaboratively track deadlines of conferences recommended by CCF (Website, Python Cli, Wechat Applet) / If you find it useful, please star this project, thanks~
Language:Vue0 0 00
ZSL98/CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
Language:Python0 0 00
ZSL98/cuda_hook
Hooked CUDA-related dynamic libraries by using automated code generation tools.
Language:C00
ZSL98/DeepLearningExamples
Deep Learning Examples
Language:Python0 0 00
ZSL98/Dubhe-proof
0 2 00
ZSL98/diffusion_policy
[RSS 2023] Diffusion Policy Visuomotor Policy Learning via Action Diffusion
Language:Python
ZSL98/fastmoe
A fast MoE impl for PyTorch
Language:Python
ZSL98/FedML
A Research-oriented Federated Learning Library. Supporting distributed computing, mobile/IoT on-device training, and standalone simulation. A short version of our white paper has been accepted by NeurIPS 2020 workshop.
Language:Python
ZSL98/gdev
First-Class GPU Resource Management: Device Drivers, Runtimes, and CUDA Compilers for Nouveau.
Language:C0 0
ZSL98/leaf
Leaf: A Benchmark for Federated Settings
Language:Python1 0
ZSL98/Megatron-LM
Ongoing research training transformer models at scale
Language:Python0 0
ZSL98/nnfusion
A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.
Language:C++0 0
ZSL98/nvsci
Linux kernel modules for secure sharing of memory buffers
Language:C0 0
ZSL98/orion
An interference-aware scheduler for fine-grained GPU sharing
Language:Python
ZSL98/Shallow-Deep-Networks
Source Code for ICML 2019 Paper "Shallow-Deep Networks: Understanding and Mitigating Network Overthinking"
Language:Python0 0
ZSL98/Swin-Transformer
This is an official implementation for "Swin Transformer: Hierarchical Vision Transformer using Shifted Windows".
Language:Python0 0
ZSL98/TBsche
1 0
ZSL98/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++
ZSL98/TGS
Artifacts for our NSDI'23 paper TGS
ZSL98/Train-Nvidia
Language:Python1 0
ZSL98/tutel
Tutel MoE: An Optimized Mixture-of-Experts Implementation
Language:Python
ZSL98/tvm
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Language:Python0 0
ZSL98/verl
veRL: Volcano Engine Reinforcement Learning for LLM
Language:Python
ZSL98/website
Language:HTML0 0
ZSL98/zsl98.github.io
Shulai Zhang's Homepage
Language:SCSS