zhangmenghao
B.S. & Ph.D, Dept. of Computer Science & Technology, Tsinghua University.
Tsinghua UniversityHaidian, Beijing
zhangmenghao's Stars
astra-sim/astra-sim
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
lumina-test/lumina
Lumina is a user-friendly tool to test the correctness and performance of hardware network stacks.
host-bench/rdma-bench
Benchmark Test Suite for RDMA Networks
checkpoint-restore/criu
Checkpoint/Restore tool
REServeLLM/Initializer
Initializer for KServe Cluster
Princeton-Cabernet/p4-projects
P4 codes for research projects
inet-tub/ns3-datacenter
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
AmberLJC/LLMSys-PaperList
Large Language Model (LLM) Systems Paper List
facebookresearch/param
PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
futurewei-cloud/zeta
Zeta is a distributed platform for developing and deploying complex, elastic, and highly available multi-tenant network services.
fylimas/nsfc
nsfc - 国家自然科学基金项目LaTeX模版(面青地)
microsoft/NPKit
NCCL Profiling Kit
Azure/msccl
Microsoft Collective Communication Library
llvm/llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
NVIDIA/open-gpu-kernel-modules
NVIDIA Linux open GPU kernel module source
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
microsoft/ebpf-for-windows
eBPF implementation that runs on top of Windows
baichuan-inc/Baichuan2
A series of large language models developed by Baichuan Intelligent Technology
v2fly/v2ray-core
A platform for building proxies to bypass network restrictions.
pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
NVIDIA/nccl-tests
NCCL Tests
facebookincubator/dynolog
Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.
NVIDIA/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
meta-llama/llama
Inference code for Llama models
openmlsys/openmlsys-zh
《Machine Learning Systems: Design and Implementation》- Chinese Version
p4lang/p4app-switchML
Switch ML Application