zhangmenghao
B.S. & Ph.D, Tsinghua University. --> Faculty, Beihang University
Tsinghua University--> Beihang UniverisityHaidian, Beijing
zhangmenghao's Stars
huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
meta-llama/llama
Inference code for Llama models
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
v2fly/v2ray-core
A platform for building proxies to bypass network restrictions.
llvm/llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
NVIDIA/open-gpu-kernel-modules
NVIDIA Linux open GPU kernel module source
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
NVIDIA/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
openmlsys/openmlsys-zh
《Machine Learning Systems: Design and Implementation》- Chinese Version
baichuan-inc/Baichuan2
A series of large language models developed by Baichuan Intelligent Technology
checkpoint-restore/criu
Checkpoint/Restore tool
microsoft/ebpf-for-windows
eBPF implementation that runs on top of Windows
NVIDIA/nccl-tests
NCCL Tests
AmberLJC/LLMSys-PaperList
Large Language Model (LLM) Systems Paper List
astra-sim/astra-sim
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
facebookincubator/dynolog
Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.
fylimas/nsfc
nsfc - 国家自然科学基金项目LaTeX模版(面青地)
Princeton-Cabernet/p4-projects
P4 codes for research projects
p4lang/p4app-switchML
Switch ML Application
facebookresearch/param
PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.
microsoft/NPKit
NCCL Profiling Kit
inet-tub/ns3-datacenter
Azure/msccl
Microsoft Collective Communication Library
host-bench/rdma-bench
Benchmark Test Suite for RDMA Networks
lumina-test/lumina
Lumina is a user-friendly tool to test the correctness and performance of hardware network stacks.
futurewei-cloud/zeta
Zeta is a distributed platform for developing and deploying complex, elastic, and highly available multi-tenant network services.
REServeLLM/Initializer
Initializer for KServe Cluster