zhangmenghao

B.S. & Ph.D, Tsinghua University. --> Faculty, Beihang University

Tsinghua University--> Beihang UniverisityHaidian, Beijing

zhangmenghao's Stars

huggingface/transformers
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Language:Python137k 1.1k 16.4k27.5k
pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python85.6k 1.8k 47.9k23k
meta-llama/llama
Inference code for Llama models
Language:Python57.1k 525 1.1k9.6k
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python36.1k 348 2.9k4.2k
v2fly/v2ray-core
A platform for building proxies to bypass network restrictions.
Language:Go30.1k 440 9934.7k
llvm/llvm-project
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
Language:LLVM30k 581 79.2k12.4k
NVIDIA/open-gpu-kernel-modules
NVIDIA Linux open GPU kernel module source
Language:C15.4k 179 3571.3k
NVIDIA/NeMo
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
Language:Python12.6k 210 2.3k2.6k
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++9.1k 97 2.1k1k
NVIDIA/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Language:C6.7k 123 2481.9k
NVIDIA/FasterTransformer
Transformer related optimization, including BERT, GPT
Language:C++6k 63 625896
openmlsys/openmlsys-zh
《Machine Learning Systems: Design and Implementation》- Chinese Version
Language:TeX4.2k 47 202440
baichuan-inc/Baichuan2
A series of large language models developed by Baichuan Intelligent Technology
Language:Python4.1k 40 395297
checkpoint-restore/criu
Checkpoint/Restore tool
Language:C3k 72 1.4k609
microsoft/ebpf-for-windows
eBPF implementation that runs on top of Windows
Language:C3k 61 1.5k241
NVIDIA/nccl-tests
NCCL Tests
Language:Cuda955 26 238251
AmberLJC/LLMSys-PaperList
Large Language Model (LLM) Systems Paper List
718 29 126
astra-sim/astra-sim
ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale
Language:C++297 14 117121
facebookincubator/dynolog
Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.
Language:C++285 16 3045
fylimas/nsfc
nsfc - 国家自然科学基金项目LaTeX模版(面青地)
Language:TeX246 5 1371
Princeton-Cabernet/p4-projects
P4 codes for research projects
Language:P4208 13 256
p4lang/p4app-switchML
Switch ML Application
Language:C++175 21 4148
facebookresearch/param
PArametrized Recommendation and Ai Model benchmark is a repository for development of numerous uBenchmarks as well as end to end nets for evaluation of training and inference platforms.
Language:Python126 23 2263
microsoft/NPKit
NCCL Profiling Kit
Language:Python122 8 1312
inet-tub/ns3-datacenter
Language:C++117 8 1334
Azure/msccl
Microsoft Collective Communication Library
58 6 106
host-bench/rdma-bench
Benchmark Test Suite for RDMA Networks
Language:C++50 2 04
lumina-test/lumina
Lumina is a user-friendly tool to test the correctness and performance of hardware network stacks.
Language:Python19 4 06
futurewei-cloud/zeta
Zeta is a distributed platform for developing and deploying complex, elastic, and highly available multi-tenant network services.
Language:C18 6 4610
REServeLLM/Initializer
Initializer for KServe Cluster
Language:Shell11

zhangmenghao

zhangmenghao's Stars

huggingface/transformers

pytorch/pytorch

meta-llama/llama

microsoft/DeepSpeed

v2fly/v2ray-core

llvm/llvm-project

NVIDIA/open-gpu-kernel-modules

NVIDIA/NeMo

NVIDIA/TensorRT-LLM

NVIDIA/cuda-samples

NVIDIA/FasterTransformer

openmlsys/openmlsys-zh

baichuan-inc/Baichuan2

checkpoint-restore/criu

microsoft/ebpf-for-windows

NVIDIA/nccl-tests

AmberLJC/LLMSys-PaperList

astra-sim/astra-sim

facebookincubator/dynolog

fylimas/nsfc

Princeton-Cabernet/p4-projects

p4lang/p4app-switchML

facebookresearch/param

microsoft/NPKit

inet-tub/ns3-datacenter

Azure/msccl

host-bench/rdma-bench

lumina-test/lumina

futurewei-cloud/zeta

REServeLLM/Initializer