Issues
- 10
Adding allreduce for ndarray
#234 opened by Hakuyume - 14
Multi-GPU training hangs
#217 opened by andremoeller - 3
NCCL_ERROR_SYSTEM_ERROR: unhandled system error
#285 opened by Fhrozen - 0
- 1
ChainerMN hangs with Open MPI 3
#221 opened by keisukefukuda - 2
optimizer.setup() created by create_multi_node_optimizer returns an original optimizer
#275 opened by rezoo - 0
- 2
Forcing forkserver spawn earlier
#278 opened by iwiwi - 2
- 3
Non-Blocking Methodology on ChainerMN
#291 opened by fengyuan14 - 1
FP16 support
#277 opened by kuenishi - 6
CUDA streams usage
#287 opened by mshiryaev - 0
- 0
- 2
Port Chainer#4191 or use Chainer's BN implementation
#203 opened by kuenishi - 23
would you please share hype parameters of GPUs=4 for resnet50 training with us ?
#254 opened by mingxiaoh - 5
Manual selection for gpus in distributed training
#269 opened by 1292765944 - 0
Expose `intra_size`, `inter_rank` and `inter_size` of communicators at readthedocs
#255 opened by iwiwi - 8
Checkpointer doesn't resume current learning rate
#225 opened by Guriido - 1
Handle list of dicts in MultiNodeIterator
#252 opened by kuenishi - 1
Add explanation of methods of communicator to document
#194 opened by iwiwi - 2
Don't inicialize global NCCL comm when
#224 opened by undertherain - 0
Provide functions for allreduce
#258 opened by kuenishi - 11
Cannot use other start method for multiprocessing
#204 opened by Guriido - 7
- 2
Asynchronous Allreduce
#241 opened by fengyuan14 - 0
We don't need `models_v1` in ImageNet examples now
#206 opened by iwiwi - 7
- 0
Print warning if inappropriate `start_method` of multiprocessing is used
#211 opened by keisukefukuda - 8
- 1
Guidance on MVAPICH vs OpenMPI
#210 opened by andremoeller - 2
Implementation choice of scatter_dataset function
#208 opened by Guriido - 0
Add Chainer 4.0.0b to Travis
#193 opened by iwiwi - 1
Fix Chainer version requirement
#198 opened by iwiwi - 1
Typos in documentation
#192 opened by Guriido - 2
- 1
Add properties `intra_rank` and `inter_rank` to `CommunicatorBase` (and hence all communicators)
#166 opened by iwiwi - 0
Add CPR to document
#170 opened by iwiwi - 0
Handle empty grads
#142 opened by iwiwi - 4
Error on train_mnist.py
#173 opened by chevalfouk - 1
Refactor `tests` directory
#155 opened by iwiwi - 2
Update supported Chainer version in the document
#162 opened by iwiwi - 0
Remove experimental flag of PureNcclCommunicator
#164 opened by iwiwi - 4
Expand the abbreviation CPR
#151 opened by iwiwi - 0
- 0
Add `chainer.utils.experimental` to `distributed_cpr`
#152 opened by iwiwi - 2
Change class name `Alltoall` to `AllToAll`
#154 opened by iwiwi - 2
- 6
Distribution Efficiency is low on AWS GPU instances
#131 opened by sonots - 2
Can't pickle Transaction objects
#129 opened by Aixile