carpedm20/ENAS-pytorch

Errors When running

axiniu opened this issue · 2 comments

@dukebw ,Hi,thanks for your work,when I run this code I meet some problems.

  1. When I run it using the run.sh by default ,I get
    THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
    Traceback (most recent call last):
    File "main.py", line 48, in
    main(args)
    File "main.py", line 30, in main
    trnr = trainer.Trainer(args, dataset)
    File "/home/axi/ENAS-pytorch-master-3/trainer.py", line 160, in init
    self.build_model()
    File "/home/axi/ENAS-pytorch-master-3/trainer.py", line 192, in build_model
    self.shared.cuda()
    File "/home/axi/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 147, in cuda
    return self._apply(lambda t: t.cuda(device_id))
    File "/home/axi/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 118, in _apply
    module._apply(fn)
    File "/home/axi/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 118, in _apply
    module._apply(fn)
    File "/home/axi/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 124, in _apply
    param.data = fn(param.data)
    File "/home/axi/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 147, in
    return self._apply(lambda t: t.cuda(device_id))
    File "/home/axi/anaconda3/lib/python3.6/site-packages/torch/_utils.py", line 66, in cuda
    return new_type(self.size()).copy
    (self, async)
    RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/THC/generic/THCStorage.cu:66

While I have 3 GPUS,10 G memory.
2. When I run it using : python main.py --network_type cnn --dataset cifar --controller_optim momentum --controller_lr_cosine=True --controller_lr_max 0.05 --controller_lr_min 0.0001 --entropy_coeff 0.1,I get:
2018-04-29 19:01:57,957:INFO::[] Make directories : logs/cifar_2018-04-29_19-01-57
Traceback (most recent call last):
File "main.py", line 48, in
main(args)
File "main.py", line 26, in main
dataset = data.image.Image(args.data_path)
File "/home/axi/ENAS-pytorch-master-2/data/image.py", line 8, in init
if args.datset == 'cifar10':
AttributeError: 'str' object has no attribute 'datset'
and after I make some changes,I get other errors such as:
2018-04-29 18:49:24,745:INFO::[
] Make directories : logs/cifar10_2018-04-29_18-49-24
Files already downloaded and verified
2018-04-29 18:49:27,464:INFO::regularizing:
Traceback (most recent call last):
File "main.py", line 48, in
main(args)
File "main.py", line 30, in main
trnr = trainer.Trainer(args, dataset)
File "/home/axi/ENAS-pytorch-master-1/trainer.py", line 139, in init
self.cuda)
File "/home/axi/ENAS-pytorch-master-1/utils.py", line 148, in batchify
data = data.narrow(0, 0, nbatch * bsz)
AttributeError: 'DataLoader' object has no attribute 'narrow'
or
2018-04-29 18:22:50,192:INFO::[*] Make directories : logs/cifar10_2018-04-29_18-22-50
Files already downloaded and verified
2018-04-29 18:22:55,041:INFO::regularizing:
Traceback (most recent call last):
File "main.py", line 48, in
main(args)
File "main.py", line 30, in main
trnr = trainer.Trainer(args, dataset)
File "/home/axi/ENAS-pytorch-master-1/trainer.py", line 139, in init
self.cuda)
File "/home/axi/ENAS-pytorch-master-1/utils.py", line 147, in batchify
nbatch = data.size // bsz
AttributeError: 'DataLoader' object has no attribute 'size'

Would you please tell me what changes I should make before I run the code.Thanks for you response.

I have the same problem. Did you solve it?

kukby commented

@axiniu I have the same problem .Did you solve it?