multi-gpu

There are 95 repositories under multi-gpu topic.

ConfettiFX/The-Forge
The Forge Cross-Platform Framework PC Windows, Steamdeck (native), Ray Tracing, macOS / iOS, Android, XBOX, PS4, PS5, Switch, Quest 2
Language:C++5.3k 181 187548
NVIDIA/OpenSeq2Seq
Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
Language:Python1.6k 86 256371
v-iashin/video_features
Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.
Language:Python624 5 80102
rbbrdckybk/dream-factory
Multi-threaded GUI manager for mass creation of AI-generated art with support for multiple GPUs.
Language:Python499 11 5755
seasonSH/DocFace
Face recognition system for ID photos
Language:Python379 18 32123
NickLucche/stable-diffusion-nvidia-docker
GPU-ready Dockerfile to run Stability.AI stable-diffusion model v2 with a simple web interface. Includes multi-GPUs support.
Language:Python369 11 2842
omlins/ParallelStencil.jl
Package for writing high-level code for parallel high-performance stencil computations that can be deployed on both GPUs and CPUs
Language:Julia368 10 6541
lattice/quda
QUDA is a library for performing calculations in lattice QCD on GPUs.
Language:C++330 60 699109
FZJ-JSC/tutorial-multi-gpu
Efficient Distributed GPU Programming for Exascale, an SC/ISC Tutorial
Language:Cuda318 13 665
tamerthamoqa/facenet-pytorch-glint360k
A PyTorch implementation of the 'FaceNet' paper for training a facial recognition model with Triplet Loss using the glint360k dataset. A pre-trained model using Triplet Loss is available for download.
Language:Python249 6 2362
helmholtz-analytics/heat
Distributed tensors and Machine Learning framework with GPU and MPI acceleration in Python
Language:Python225 7 78757
bharatsingh430/py-R-FCN-multiGPU
Code for training py-faster-rcnn and py-R-FCN on multiple GPUs in caffe
Language:Jupyter Notebook192 16 2996
eth-cscs/ImplicitGlobalGrid.jl
Almost trivial distributed parallelization of stencil-based GPU and CPU applications on a regular staggered grid
Language:Julia192 12 2521
GPUSPH/gpusph
The world's first CUDA implementation of Weakly-Compressible Smoothed Particle Hydrodynamics
Language:C++182 34 7666
papuSpartan/stable-diffusion-webui-distributed
Chains stable-diffusion-webui instances together to facilitate faster image generation.
Language:Python182 5 2214
guotong1988/BERT-pre-training
multi-gpu pre-training in one machine for BERT without horovod (Data Parallelism)
Language:Python171 7 3554
celerity/celerity-runtime
High-level C++ for Accelerator Clusters
Language:C++153 10 1520
tensordiffeq/TensorDiffEq
Efficient and Scalable Physics-Informed Deep Learning and Scientific Machine Learning on top of Tensorflow for multi-worker distributed computing
Language:Python116 6 1143
projectchrono/DEM-Engine
A dual-GPU DEM solver with complex grain geometry support
Language:C++98 3 427
tugrul512bit/Cekirdekler
Multi-device OpenCL kernel load balancer and pipeliner API for C#. Uses shared-distributed memory model to keep GPUs updated fast while using same kernel on all devices(for simplicity).
Language:C#96 15 5610
rickiepark/deep-learning-with-python-2nd
<케라스 창시자에게 배우는 딥러닝 2판> 도서의 코드 저장소
Language:Jupyter Notebook81 2 198
hfxunlp/transformer
Neutron: A pytorch based implementation of Transformer and its variants.
Language:Python64 7 39
andreped/GradientAccumulator
:dart: Gradient Accumulation for TensorFlow 2
Language:Python53 2 7511
predsci/POT3D
POT3D: High Performance Potential Field Solver
Language:Fortran47 9 828
Shamrock-code/Shamrock
The Shamrock Framework, an open-source, multi-GPU hydrodynamics framework for astrophysics. Scales seamlessly from laptops to exascale supercomputers, supporting SPH, AMR, and more.
Language:C++45 3 26716
kuixu/keras_multi_gpu
Multi-GPU training for Keras
Language:Python44 4 522
lupantech/dual-mfa-vqa
Co-attending Regions and Detections for VQA.
Language:MATLAB40 5 314
YukeWang96/MGG_OSDI23
Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Multi-GPU Platforms.
Language:Cuda40 1 55
miguelcarcamov/gpuvmem
GPU Framework for Radio Astronomical Image Synthesis
Language:Cuda29 5 33
hainuo-wang/XReflection
XReflection is a neat toolbox tailored for single-image reflection removal(SIRR). We offer state-of-the-art SIRR solutions for training and inference, with a high-performance data pipeline, multi-GPU/TPU/NPU support, and more!
Language:Python28
kentaroy47/pytorch-mgpu-cifar10
testing multi gpu for pytorch
Language:Python26 1 09
Erfan-Ahmadi/TheForgeExamples
Graphic Techniques Implemented on The Forge API, a cross-platform rendering framework on top of Vulkan, DirectX, Metal
Language:C++25 1 14
dmarnerides/dlt
Deep Learning Toolbox for Torch
Language:Lua21 1 02
ParCoreLab/CPU-Free-model
Source code for the CPU-Free model - a fully autonomous execution model for multi-GPU applications that completely excludes the involvement of the CPU beyond the initial kernel launch.
Language:Cuda21 3 43
Zhengyu-Li/Deep-Network-Compression-based-on-Student-Teacher-Network-
Deep Neural Network Compression based on Student-Teacher Network
Language:Python14 2 12
18520339/ml-distributed-training
Reduce the training time of CNNs by leveraging the power of multiple GPUs in 2 approaches, Multi-workers & Parameter Sever Training using TensorFlow 2
Language:Jupyter Notebook12 1 23

multi-gpu

ConfettiFX/The-Forge

NVIDIA/OpenSeq2Seq

v-iashin/video_features

rbbrdckybk/dream-factory

seasonSH/DocFace

NickLucche/stable-diffusion-nvidia-docker

omlins/ParallelStencil.jl

lattice/quda

FZJ-JSC/tutorial-multi-gpu

tamerthamoqa/facenet-pytorch-glint360k

helmholtz-analytics/heat

bharatsingh430/py-R-FCN-multiGPU

eth-cscs/ImplicitGlobalGrid.jl

GPUSPH/gpusph

papuSpartan/stable-diffusion-webui-distributed

guotong1988/BERT-pre-training

celerity/celerity-runtime

tensordiffeq/TensorDiffEq

projectchrono/DEM-Engine

tugrul512bit/Cekirdekler

rickiepark/deep-learning-with-python-2nd

hfxunlp/transformer

andreped/GradientAccumulator

predsci/POT3D

Shamrock-code/Shamrock

kuixu/keras_multi_gpu

lupantech/dual-mfa-vqa

YukeWang96/MGG_OSDI23

miguelcarcamov/gpuvmem

hainuo-wang/XReflection

kentaroy47/pytorch-mgpu-cifar10

Erfan-Ahmadi/TheForgeExamples

dmarnerides/dlt

ParCoreLab/CPU-Free-model

Zhengyu-Li/Deep-Network-Compression-based-on-Student-Teacher-Network-

18520339/ml-distributed-training