suisiyuan's Stars
microsoft/msccl
Microsoft Collective Communication Library
ROCm/amd_matrix_instruction_calculator
A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators
ROCm/AMDMIGraphX
AMD's graph optimization engine.
ROCm/ROCm
AMD ROCm™ Software - GitHub Home
volcengine/veScale
A PyTorch Native LLM Training Framework
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
pigirons/cpufp
A CPU tool for benchmarking the peak of floating points
intel/xFasterTransformer
bytedance/ByteMLPerf
AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and versatility of software and hardware.
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
OpenPPL/ppl.nn.llm
OpenPPL/ppl.llm.kernel.cuda
OpenPPL/ppl.pmx
OpenPPL/ppl.llm.serving
meta-llama/llama
Inference code for Llama models
Significant-Gravitas/AutoGPT
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
THUDM/ChatGLM-6B
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
bytedance/effective_transformer
Running BERT without Padding
torvalds/linux
Linux kernel source tree
open-mmlab/mmengine
OpenMMLab Foundational Library for Training Deep Learning Models
PaddlePaddle/Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
bannedbook/fanqiang
翻墙-科学上网
pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
hwdsl2/docker-ipsec-vpn-server
Docker image to run an IPsec VPN server, with IPsec/L2TP, Cisco IPsec and IKEv2
fatedier/frp
A fast reverse proxy to help you expose a local server behind a NAT or firewall to the internet.
OpenMathLib/OpenBLAS
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
open-mmlab/mmdeploy
OpenMMLab Model Deployment Framework
NVIDIA/CUDALibrarySamples
CUDA Library Samples
NVIDIA/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit