JamesTheZ

Alibaba Group

Pinned Repositories

BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
Language:C++764 35 231159
flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
Language:Cuda150 5 411
BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
Language:C++1 0 01
CudaProf
A profiler for CUDA programs based on CUPTI. Similar to NVIDIA Profiler, but simpler.
Language:C4 2 00
flash-llm
Language:Cuda10
jamesthez.github.io
Website of Zhen Zheng.
Language:JavaScript20
VersaPipe
A framework for pipelined computing on GPU
Language:C++27 5 19
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python33.4k 339 2.6k3.9k
DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Language:Python1.7k 41 278160
fp6_llm
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
Language:Cuda141 4 712

JamesTheZ's Repositories

JamesTheZ/VersaPipe
A framework for pipelined computing on GPU
Language:C++27 5 19
JamesTheZ/CudaProf
A profiler for CUDA programs based on CUPTI. Similar to NVIDIA Profiler, but simpler.
Language:C4 2 00
JamesTheZ/jamesthez.github.io
Website of Zhen Zheng.
Language:JavaScript20
JamesTheZ/BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
Language:C++1 0 01
JamesTheZ/flash-llm
Language:Cuda10
JamesTheZ/awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
01
JamesTheZ/cuda_image_filtering_constant
Language:C++0 1 00
JamesTheZ/cuda_image_filtering_shared
Language:C++0 1 00
JamesTheZ/peizhishi.github.io
Language:SCSS00
JamesTheZ/persistVGG
Pure cuda implementation of VGG net
0 2 00
JamesTheZ/shell_script
一键安装 shadowsocks，支持 chacha20-ietf-poly1305 加密方式
Language:Shell0 1 00
JamesTheZ/SyncMicrobenchmark
This work aims at characterizing the synchronization methods in CUDA.
Language:C0 0 00
JamesTheZ/tensorflow
Language:C++0 0 00
JamesTheZ/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python
JamesTheZ/fp6_llm
An efficient GPU support for LLM inference with 6-bit quantization (FP6).
JamesTheZ/recom
JamesTheZ/tensorflow-internals
It is open source ebook about TensorFlow kernel and implementation mechanism.
Language:TeX0 0
JamesTheZ/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++
JamesTheZ/unlock-music
Unlock encrypted music file in browser. 在浏览器中解锁加密的音乐文件。
JamesTheZ/xla_hlo_dump_parse
Language:Python0 0

JamesTheZ

Pinned Repositories

BladeDISC

flash-llm

BladeDISC

CudaProf

flash-llm

jamesthez.github.io

VersaPipe

DeepSpeed

DeepSpeed-MII

fp6_llm

JamesTheZ's Repositories

JamesTheZ/VersaPipe

JamesTheZ/CudaProf

JamesTheZ/jamesthez.github.io

JamesTheZ/BladeDISC

JamesTheZ/flash-llm

JamesTheZ/awesome-tensor-compilers

JamesTheZ/cuda_image_filtering_constant

JamesTheZ/cuda_image_filtering_shared

JamesTheZ/peizhishi.github.io

JamesTheZ/persistVGG

JamesTheZ/shell_script

JamesTheZ/SyncMicrobenchmark

JamesTheZ/tensorflow

JamesTheZ/DeepSpeed

JamesTheZ/fp6_llm

JamesTheZ/recom

JamesTheZ/tensorflow-internals

JamesTheZ/TensorRT-LLM

JamesTheZ/unlock-music

JamesTheZ/xla_hlo_dump_parse