multi-gpu
There are 79 repositories under multi-gpu topic.
ConfettiFX/The-Forge
The Forge Cross-Platform Framework PC Windows, Steamdeck (native), Ray Tracing, macOS / iOS, Android, XBOX, PS4, PS5, Switch, Quest 2
NVIDIA/OpenSeq2Seq
Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
v-iashin/video_features
Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.
rbbrdckybk/dream-factory
Multi-threaded GUI manager for mass creation of AI-generated art with support for multiple GPUs.
seasonSH/DocFace
Face recognition system for ID photos
NickLucche/stable-diffusion-nvidia-docker
GPU-ready Dockerfile to run Stability.AI stable-diffusion model v2 with a simple web interface. Includes multi-GPUs support.
omlins/ParallelStencil.jl
Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs
lattice/quda
QUDA is a library for performing calculations in lattice QCD on GPUs.
FZJ-JSC/tutorial-multi-gpu
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
tamerthamoqa/facenet-pytorch-glint360k
A PyTorch implementation of the 'FaceNet' paper for training a facial recognition model with Triplet Loss using the glint360k dataset. A pre-trained model using Triplet Loss is available for download.
helmholtz-analytics/heat
Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
bharatsingh430/py-R-FCN-multiGPU
Code for training py-faster-rcnn and py-R-FCN on multiple GPUs in caffe
eth-cscs/ImplicitGlobalGrid.jl
Almost trivial distributed parallelization of stencil-based GPU and CPU applications on a regular staggered grid
papuSpartan/stable-diffusion-webui-distributed
Chains stable-diffusion-webui instances together to facilitate faster image generation.
guotong1988/BERT-pre-training
multi-gpu pre-training in one machine for BERT without horovod (Data Parallelism)
GPUSPH/gpusph
The world's first CUDA implementation of Weakly-Compressible Smoothed Particle Hydrodynamics
celerity/celerity-runtime
High-level C++ for Accelerator Clusters
tensordiffeq/TensorDiffEq
Efficient and Scalable Physics-Informed Deep Learning and Scientific Machine Learning on top of Tensorflow for multi-worker distributed computing
tugrul512bit/Cekirdekler
Multi-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).
projectchrono/DEM-Engine
A dual-GPU DEM solver with complex grain geometry support
rickiepark/deep-learning-with-python-2nd
<케라스 창시자에게 배우는 딥러닝 2판> 도서의 코드 저장소
hfxunlp/transformer
Neutron: A pytorch based implementation of Transformer and its variants.
andreped/GradientAccumulator
:dart: Gradient Accumulation for TensorFlow 2
predsci/POT3D
POT3D: High Performance Potential Field Solver
kuixu/keras_multi_gpu
Multi-GPU training for Keras
lupantech/dual-mfa-vqa
Co-attending Regions and Detections for VQA.
YukeWang96/MGG_OSDI23
Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms.
miguelcarcamov/gpuvmem
GPU Framework for Radio Astronomical Image Synthesis
kentaroy47/pytorch-mgpu-cifar10
testing multi gpu for pytorch
Erfan-Ahmadi/TheForgeExamples
Graphic Techniques Implemented on The Forge API, a cross-platform rendering framework on top of Vulkan, DirectX, Metal
dmarnerides/dlt
Deep Learning Toolbox for Torch
AnimaVR/NeuroSync_Trainer_Lite
A multi GPU audio2face blendshape AI model trainer for your iPhone ARKit data.
ParCoreLab/CPU-Free-model
Source code for the CPU-Free model - a fully autonomous execution model for multi-GPU applications that completely excludes the involvement of the CPU beyond the initial kernel launch.
Shamrock-code/Shamrock
Shamrock Multi-GPU hydrodynamics for astrophysics.
Zhengyu-Li/Deep-Network-Compression-based-on-Student-Teacher-Network-
Deep Neural Network Compression based on Student-Teacher Network
18520339/ml-distributed-training
Reduce the training time of CNNs by leveraging the power of multiple GPUs in 2 approaches, Multi-workers & Parameter Sever Training using TensorFlow 2