nccl
There are 35 repositories under nccl topic.
cupy/cupy
NumPy & SciPy for GPU
coreylowman/cudarc
Safe rust wrapper around CUDA toolkit
huggingface/llm_training_handbook
An open collection of methodologies to help with successful training of large language models.
huggingface/large_language_model_training_playbook
An open collection of implementation tips, tricks and resources for training large language models
LambdaLabsML/distributed-training-guide
Best practices & guides on how to write distributed pytorch training code
Bluefog-Lib/bluefog
Distributed and decentralized training framework for PyTorch over graph
FZJ-JSC/tutorial-multi-gpu
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
microsoft/msrflute
Federated Learning Utilities and Tools for Experimentation
google/nccl-fastsocket
NCCL Fast Socket is a transport layer plugin to improve NCCL collective communication performance on Google Cloud.
JuliaGPU/NCCL.jl
A Julia wrapper for the NVIDIA Collective Communications Library.
muriloboratto/NCCL
Sample examples of how to call collective operation functions on multi-GPU environments. A simple example of using broadcast, reduce, allGather, reduceScatter and sendRecv operations.
lanl/pyDNMFk
Python Distributed Non Negative Matrix Factorization with custom clustering
openhackathons-org/nways_multi_gpu
N-Ways to Multi-GPU Programming
BaguaSys/bagua-net
High performance NCCL plugin for Bagua.
1duo/nccl-examples
NCCL Examples from Official NVIDIA NCCL Developer Guide.
YinLiu-91/ncclOperationPlus
use ncclSend ncclRecv realize ncclSendrecv ncclGather ncclScatter ncclAlltoall
asprenger/distributed-training-patterns
Experiments with low level communication patterns that are useful for distributed training.
UCBerkeley-Spring2022-CS267-project/blinkplus
Blink+: Increase GPU group bandwidth by utilizing across tenant NVLink.
lancelee82/pynccl
Nvidia NCCL2 Python bindings using ctypes and numba.
muriloboratto/hands-on-supercomputing-with-parallel-computing
Hands-on Labs in Parallel Computing
rohwid/auto-nvidia-cuda-driver
Installation script to install Nvidia driver and CUDA automatically in Ubuntu
YconquestY/nccl
Summary of call graphs and data structures of NVIDIA Collective Communication Library (NCCL)
lcskrishna/nccl-rccl-parser
Tool to run rccl-tests/nccl-tests based on from an application and gather performance.
dereklstinson/nccl
golang wrapper for nccl
lancelee82/necklace
Distributed deep learning framework based on pytorch/numba/nccl and zeromq.
MurrellGroup/Conflux.jl
Single-node data parallelism in Julia with CUDA
rodhuega/tfgMatrixNccl
Librería de operaciones matemáticas con matrices multi-gpu utilizando Nvidia NCCL.
superlinear-ai/scipy-notebook-gpu
jupyter/scipy-notebook with CUDA Toolkit, cuDNN, NCCL, and TensorRT
GPU-Blood-Cell-Simulation/Simulation-Server
Blood Cell Simulation server
melanie-t27/2024-EUMaster4HPC-Student-Challenge
EUMaster4HPC student challenge group 7 - EuroHPC Summit 2024 Antwerp
nikombr/hpccuda
Advanced High Performance Computing in C with OpenMP, CUDA, MPI and NCCL. The folder project includes my final project for the special course. I implemented a Jacobi-solver for the Poisson partial differential problem both using OpenMP in the CPU, using CUDA on the GPU and using CUDA, MPI and NCCL on multiple GPUs.
SquareFactory/ml-default
Default Docker image used to run experiments on csquare.run.
sub-mod/nccl-builds
nccl built on centos6
TyBruceChen/Tutorial-Conda-cuDNN-NCCL-installation-for-Pytorch
This is a tutorial for installing CUDA (v11.8) and cuDNN (8.6.9) to enable programming torch with GPU. It also mentions about implementation of NCCL for distributed GPU DNN model training.