Pinned Repositories
BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
CudaProf
A profiler for CUDA programs based on CUPTI. Similar to NVIDIA Profiler, but simpler.
flash-llm
jamesthez.github.io
Website of Zhen Zheng.
VersaPipe
A framework for pipelined computing on GPU
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
DeepSpeed-MII
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
fp6_llm
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
JamesTheZ's Repositories
JamesTheZ/VersaPipe
A framework for pipelined computing on GPU
JamesTheZ/CudaProf
A profiler for CUDA programs based on CUPTI. Similar to NVIDIA Profiler, but simpler.
JamesTheZ/jamesthez.github.io
Website of Zhen Zheng.
JamesTheZ/BladeDISC
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
JamesTheZ/flash-llm
JamesTheZ/awesome-tensor-compilers
A list of awesome compiler projects and papers for tensor computation and deep learning.
JamesTheZ/cuda_image_filtering_constant
JamesTheZ/cuda_image_filtering_shared
JamesTheZ/peizhishi.github.io
JamesTheZ/persistVGG
Pure cuda implementation of VGG net
JamesTheZ/shell_script
一键安装 shadowsocks,支持 chacha20-ietf-poly1305 加密方式
JamesTheZ/SyncMicrobenchmark
This work aims at characterizing the synchronization methods in CUDA.
JamesTheZ/tensorflow
JamesTheZ/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
JamesTheZ/fp6_llm
An efficient GPU support for LLM inference with 6-bit quantization (FP6).
JamesTheZ/recom
JamesTheZ/tensorflow-internals
It is open source ebook about TensorFlow kernel and implementation mechanism.
JamesTheZ/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
JamesTheZ/unlock-music
Unlock encrypted music file in browser. 在浏览器中解锁加密的音乐文件。
JamesTheZ/xla_hlo_dump_parse