suisiyuan

工欲善其事，必先利其器。

Shanghai

suisiyuan's Stars

microsoft/msccl
Microsoft Collective Communication Library
Language:C++30429
ROCm/amd_matrix_instruction_calculator
A tool for generating information about the matrix multiplication instructions in AMD Radeon™ and AMD Instinct™ accelerators
Language:Python605
ROCm/AMDMIGraphX
AMD's graph optimization engine.
Language:C++18384
ROCm/ROCm
AMD ROCm™ Software - GitHub Home
Language:Shell4.5k370
volcengine/veScale
A PyTorch Native LLM Training Framework
Language:Python57528
NVIDIA/cutlass
CUDA Templates for Linear Algebra Subroutines
Language:C++5.4k903
NVIDIA/TensorRT-LLM
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Language:C++8.2k906
pigirons/cpufp
A CPU tool for benchmarking the peak of floating points
Language:Assembly472120
intel/xFasterTransformer
Language:C++34860
bytedance/ByteMLPerf
AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and versatility of software and hardware.
Language:Python18850
microsoft/DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Language:Python34.7k4k
OpenPPL/ppl.nn.llm
14018
OpenPPL/ppl.llm.kernel.cuda
Language:C++13324
OpenPPL/ppl.pmx
Language:Python5615
OpenPPL/ppl.llm.serving
Language:C++12313
meta-llama/llama
Inference code for Llama models
Language:Python55.5k9.5k
Significant-Gravitas/AutoGPT
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
Language:Python167k44.1k
THUDM/ChatGLM-6B
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
Language:Python40.4k5.2k
bytedance/effective_transformer
Running BERT without Padding
Language:C++45552
torvalds/linux
Linux kernel source tree
Language:C178k53.2k
open-mmlab/mmengine
OpenMMLab Foundational Library for Training Deep Learning Models
Language:Python1.1k340
PaddlePaddle/Paddle
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）
Language:C++22.1k5.5k
bannedbook/fanqiang
翻墙-科学上网
Language:Kotlin38.1k7.2k
pytorch/pytorch
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Language:Python82.2k22.1k
hwdsl2/docker-ipsec-vpn-server
Docker image to run an IPsec VPN server, with IPsec/L2TP, Cisco IPsec and IKEv2
Language:Shell6.4k1.4k
fatedier/frp
A fast reverse proxy to help you expose a local server behind a NAT or firewall to the internet.
Language:Go84.4k13.2k
OpenMathLib/OpenBLAS
OpenBLAS is an optimized BLAS library based on GotoBLAS2 1.13 BSD version.
Language:C6.3k1.5k
open-mmlab/mmdeploy
OpenMMLab Model Deployment Framework
Language:Python2.7k618
NVIDIA/CUDALibrarySamples
CUDA Library Samples
Language:Cuda1.5k318
NVIDIA/cuda-samples
Samples for CUDA Developers which demonstrates features in CUDA Toolkit
Language:C6.1k1.8k

suisiyuan

suisiyuan's Stars

microsoft/msccl

ROCm/amd_matrix_instruction_calculator

ROCm/AMDMIGraphX

ROCm/ROCm

volcengine/veScale

NVIDIA/cutlass

NVIDIA/TensorRT-LLM

pigirons/cpufp

intel/xFasterTransformer

bytedance/ByteMLPerf

microsoft/DeepSpeed

OpenPPL/ppl.nn.llm

OpenPPL/ppl.llm.kernel.cuda

OpenPPL/ppl.pmx

OpenPPL/ppl.llm.serving

meta-llama/llama

Significant-Gravitas/AutoGPT

THUDM/ChatGLM-6B

bytedance/effective_transformer

torvalds/linux

open-mmlab/mmengine

PaddlePaddle/Paddle

bannedbook/fanqiang

pytorch/pytorch

hwdsl2/docker-ipsec-vpn-server

fatedier/frp

OpenMathLib/OpenBLAS

open-mmlab/mmdeploy

NVIDIA/CUDALibrarySamples

NVIDIA/cuda-samples