roboschool affects pytorch.multiprocessing
ShangtongZhang opened this issue · 1 comments
ShangtongZhang commented
I'm using os x 10.12, my pytorch version is 0.2.0_3, python version is
3.6.3 (default, Oct 4 2017, 06:09:15)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)]
See the following minimal example:
import torch
import torch.nn as nn
from torch.autograd import Variable
import numpy as np
import torch.multiprocessing as mp
import time
import sys
import roboschool
print(torch.__version__)
print(sys.version)
# batch_size = 64
batch_size = 128
def train(id):
while True:
fc = nn.Linear(5, 100)
x = Variable(torch.FloatTensor(np.random.rand(batch_size, 5)), volatile=True)
y = fc(x)
num_workers = 8
ps = [mp.Process(target=train, args=(i, )) for i in range(num_workers)]
for p in ps: p.start()
while True:
time.sleep(1)
for i, p in enumerate(ps):
if not p.is_alive():
print('Worker %d exited unexpectedly.' % i)
p.terminate()
ps[i] = mp.Process(target=train, args=(i, ))
ps[i].start()
print('Worker %d restarted.' % i)
break
for p in procs: p.join()
It will output:
0.2.0_3
3.6.3 (default, Oct 4 2017, 06:09:15)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)]
Worker 0 exited unexpectedly.
Worker 0 restarted.
Worker 0 exited unexpectedly.
Worker 0 restarted.
Worker 0 exited unexpectedly.
Worker 0 restarted.
If I change the batch size to 64, then it works well.
If I don't import roboschool, both 64 and 128 work well.
I notice a similar issue here #53
but mp.set_start_method('spawn') doesn't work for me
olegklimov commented
I have no idea. Sometimes esoteric things happen. Yes, like in issue that you link, it can be libraries or load order of libraries. Let me know if you have workaround that will work for most people.