limin2021

major in computer science:high performance computing and parallel computing.

ISCASBeijing

limin2021's Stars

opencv/opencv
Open Source Computer Vision Library
Language:C++76.8k 2.7k 10.6k55.7k
imarvinle/awesome-cs-books
🔥 经典编程书籍大全，涵盖：计算机系统与网络、系统架构、算法与数据结构、前端开发、后端开发、移动开发、数据库、测试、项目与团队、程序员职业修炼、求职面试等
17.3k 245 162.5k
alibaba/MNN
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
Language:C++8.4k 201 2.5k1.6k
huggingface/accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
Language:Python7.3k 99 1.5k870
CVCUDA/CV-CUDA
CV-CUDA™ is an open-source, GPU accelerated library for cloud-scale image processing and computer vision.
Language:C++2.3k 45 151209
laekov/fastmoe
A fast MoE impl for PyTorch
Language:Python1.5k 12 113174
intelligent-machine-learning/dlrover
DLRover: An Automatic Distributed Deep Learning System
Language:Python1.1k 49 216137
DefTruth/CUDA-Learn-Notes
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记，更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
Language:Cuda823 10 584
NVIDIA/nccl-tests
NCCL Tests
Language:Cuda728 28 196216
andravin/wincnn
Winograd minimal convolution algorithm generator for convolutional neural networks.
Language:Python592 30 25142
lhao499/ringattention
Transformers with Arbitrarily Large Context
Language:Python563 5 1343
volcengine/veScale
A PyTorch Native LLM Training Framework
Language:Python473 36 719
NVIDIA/AMGX
Distributed multigrid linear solver library on GPU
Language:Cuda464 32 194136
bytedance/ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
Language:C++434 10 1033
zhuzilin/ring-flash-attention
Ring attention implementation with flash attention
Language:Python410 9 2530
codeplaysoftware/portBLAS
An implementation of BLAS using the SYCL open standard.
Language:C++237 23 4648
InternLM/InternEvo
Language:Python225 7 6337
sail-sg/zero-bubble-pipeline-parallelism
Zero Bubble Pipeline Parallelism
Language:Python210 5 159
feifeibear/long-context-attention
Sequence Parallel Attention for Long Context LLM Model Training and Inference
Language:Python195 4 97
mit-han-lab/inter-operator-scheduler
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
Language:C++189 8 2429
rsnemmen/OpenCL-examples
Simple OpenCL examples for exploiting GPU computing
Language:Objective-C++187 8 170
RulinShao/LightSeq
Official repository for LightSeq: Sequence Level Parallelism for Distributed Training of Long Context Transformers
Language:Python162 4 88
FlagOpen/FlagGems
FlagGems is an operator library for large language models implemented in Triton Language.
Language:Python131 11 65
codeplaysoftware/portDNN
portDNN is a library implementing neural network algorithms written using SYCL
Language:C++105 25 422
anyscale/llm-continuous-batching-benchmarks
Language:Python103 3 418
icl-utk-edu/blaspp
BLAS++ is a C++ wrapper around CPU and GPU BLAS (basic linear algebra subroutines), developed as part of the SLATE project.
Language:C++57 5 1820
exists-forall/striped_attention
Language:Python282
UDC-GAC/venom
A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores
Language:Python25 1 54
lzhangbv/dear_pytorch
[ICDCS 2023] DeAR: Accelerating Distributed Deep Learning with Fine-Grained All-Reduce Pipelining
Language:Python12 1 03
kjbartel/clmagma
OpenCL version of Matrix Algebra on GPU and Multicore Architectures (MAGMA) source releases from http://icl.cs.utk.edu/magma/index.html
Language:FORTRAN23

limin2021

limin2021's Stars

opencv/opencv

imarvinle/awesome-cs-books

alibaba/MNN

huggingface/accelerate

CVCUDA/CV-CUDA

laekov/fastmoe

intelligent-machine-learning/dlrover

DefTruth/CUDA-Learn-Notes

NVIDIA/nccl-tests

andravin/wincnn

lhao499/ringattention

volcengine/veScale

NVIDIA/AMGX

bytedance/ByteTransformer

zhuzilin/ring-flash-attention

codeplaysoftware/portBLAS

InternLM/InternEvo

sail-sg/zero-bubble-pipeline-parallelism

feifeibear/long-context-attention

mit-han-lab/inter-operator-scheduler

rsnemmen/OpenCL-examples

RulinShao/LightSeq

FlagOpen/FlagGems

codeplaysoftware/portDNN

anyscale/llm-continuous-batching-benchmarks

icl-utk-edu/blaspp

exists-forall/striped_attention

UDC-GAC/venom

lzhangbv/dear_pytorch

kjbartel/clmagma