run error

Question

run error

Closed this issue 6 years ago · 4 comments

ec2 g2.2xlarge and p2.xlarge machine

pip install plat:
Successfully installed plat-0.2.2

plat sample --model celeba_64.discgen
Loading DiscGenModel interface from discgen.interface
Loading model celeba_64.discgen

Using gpu device 0: GRID K520 (CNMeM is disabled, cuDNN 4007)
Model loaded.
Building computation graph...
Compiling sampling function...
Traceback (most recent call last):
File "/home/ubuntu/anaconda2/envs/tensorflow/bin/plat", line 11, in
sys.exit(main())
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/plat/bin/platcmd.py", line 12, in main
handler.run()
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/arghandler/base.py", line 295, in run
self._subcommand_lookupargs.cmd
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/plat/bin/sample.py", line 256, in sample
run_with_args(args, dmodel, args.anchor_image, args.save_path, cur_z_step)
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/plat/bin/sample.py", line 138, in run_with_args
plat.sampling.grid_from_latents(z, dmodel, args.rows, args.cols, anchor_images, args.tight, args.shoulders, cur_save_path, args, args.batch_size)
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/plat/sampling.py", line 108, in grid_from_latents
decoded = dmodel.sample_at(cur_z)
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/discgen/interface.py", line 79, in sample_at
latents, samples = self.sampling_function(z_float)
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/compile/function_module.py", line 871, in call
storage_map=getattr(self.fn, 'storage_map', None))
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/gof/link.py", line 314, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/compile/function_module.py", line 859, in call
outputs = self.fn()
ValueError: CorrMM received weight with wrong type.
Apply node that caused the error: CorrMM{(1, 1), (1, 1)}(decoder_convnet_apply_args_0, Subtensor{::, ::, ::int64, ::int64}.0)
Toposort index: 158
Inputs types: [TensorType(float64, 4D), TensorType(float32, 4D)]
Inputs shapes: [(21, 256, 4, 4), (256, 256, 3, 3)]
Inputs strides: [(32768, 128, 32, 8), (9216, 36, -12, -4)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[Elemwise{Composite{(i0 * (Composite{((((i0 - i1) / i2) * i3) + i4)}(i1, i2, i3, i4, i5) + Abs(Composite{((((i0 - i1) / i2) * i3) + i4)}(i1, i2, i3, i4, i5))))}}[(0, 1)](TensorConstant{(1, 1, 1, 1) of 0.5}, conv1_apply_output, shape_padleft(population_mean), shape_padleft(population_stdev), shape_padleft(batch_norm_scale), shape_padleft(batch_norm_shift))]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/discgen/interface.py", line 76, in sample_at
self.sampling_function = get_decoder_function(self.model)
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/discgen/interface.py", line 39, in get_decoder_function
(-1,) + decoder_convnet.get_dim('input_')))
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/blocks/bricks/base.py", line 371, in call
return self.application.apply(self, *args, **kwargs)
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/blocks/bricks/base.py", line 306, in apply
outputs = self.application_function(brick, *args, **kwargs)
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/blocks/bricks/sequences.py", line 37, in apply
output = application_method(*pack(child_input))
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/blocks/bricks/base.py", line 371, in call
return self.application.apply(self, *args, **kwargs)
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/blocks/bricks/base.py", line 306, in apply
outputs = self.application_function(brick, *args, **kwargs)
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/blocks/bricks/conv.py", line 144, in apply
self.filter_size))

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Answer 1 · 2016-12-16T03:08:56.000Z

plat sample --model celeba_64.discgen
Loading DiscGenModel interface from discgen.interface
Loading model celeba_64.discgen

Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 4007)

Model loaded.
Building computation graph...
Compiling sampling function...
Traceback (most recent call last):
File "/home/ubuntu/anaconda2/envs/tensorflow/bin/plat", line 11, in
sys.exit(main())
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/plat/bin/platcmd.py", line 12, in main
handler.run()
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/arghandler/base.py", line 295, in run
self._subcommand_lookupargs.cmd
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/plat/bin/sample.py", line 256, in sample
run_with_args(args, dmodel, args.anchor_image, args.save_path, cur_z_step)
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/plat/bin/sample.py", line 138, in run_with_args
plat.sampling.grid_from_latents(z, dmodel, args.rows, args.cols, anchor_images, args.tight, args.shoulders, cur_save_path, args, args.batch_size)
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/plat/sampling.py", line 108, in grid_from_latents
decoded = dmodel.sample_at(cur_z)
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/discgen/interface.py", line 79, in sample_at
latents, samples = self.sampling_function(z_float)
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/compile/function_module.py", line 871, in call
storage_map=getattr(self.fn, 'storage_map', None))
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/gof/link.py", line 314, in raise_with_op
reraise(exc_type, exc_value, exc_trace)
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/theano/compile/function_module.py", line 859, in call
outputs = self.fn()
ValueError: CorrMM received weight with wrong type.
Apply node that caused the error: CorrMM{(1, 1), (1, 1)}(decoder_convnet_apply_args_0, Subtensor{::, ::, ::int64, ::int64}.0)
Toposort index: 158
Inputs types: [TensorType(float64, 4D), TensorType(float32, 4D)]
Inputs shapes: [(21, 256, 4, 4), (256, 256, 3, 3)]
Inputs strides: [(32768, 128, 32, 8), (9216, 36, -12, -4)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[Elemwise{Composite{(i0 * (Composite{((((i0 - i1) / i2) * i3) + i4)}(i1, i2, i3, i4, i5) + Abs(Composite{((((i0 - i1) / i2) * i3) + i4)}(i1, i2, i3, i4, i5))))}}[(0, 1)](TensorConstant{(1, 1, 1, 1) of 0.5}, conv1_apply_output, shape_padleft(population_mean), shape_padleft(population_stdev), shape_padleft(batch_norm_scale), shape_padleft(batch_norm_shift))]]

Backtrace when the node is created(use Theano flag traceback.limit=N to make it longer):
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/discgen/interface.py", line 76, in sample_at
self.sampling_function = get_decoder_function(self.model)
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/discgen/interface.py", line 39, in get_decoder_function
(-1,) + decoder_convnet.get_dim('input_')))
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/blocks/bricks/base.py", line 371, in call
return self.application.apply(self, *args, **kwargs)
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/blocks/bricks/base.py", line 306, in apply
outputs = self.application_function(brick, *args, **kwargs)
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/blocks/bricks/sequences.py", line 37, in apply
output = application_method(*pack(child_input))
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/blocks/bricks/base.py", line 371, in call
return self.application.apply(self, *args, **kwargs)
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/blocks/bricks/base.py", line 306, in apply
outputs = self.application_function(brick, *args, **kwargs)
File "/home/ubuntu/anaconda2/envs/tensorflow/lib/python2.7/site-packages/blocks/bricks/conv.py", line 144, in apply
self.filter_size))

HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.

Answer 2 · 2016-12-23T08:28:41.000Z

Thanks for your report - it looks like a problem deserializing the model. Do you know what version of Theano you were using? The serialized discgen models depend on a specific version of Theano which is installed as part of the discgen install process. If you wanted to try switching to this version I believe you can try that via:

pip install --no-dependencies --upgrade https://github.com/Theano/Theano.git@a3bbfb8c#egg=theano

If it's not the Theano version, it might instead be an issue with cuDNN (I'm running 5105) or some other dependency.

Answer 3 · 2017-02-17T08:20:18.000Z

same error on ec2(Tesla K80). and the Theano version is theano-0.9.0.dev1.
CUDA version is 7.5 and cudnn is 5110.
i install theano with cmd:
pip install git+https://github.com/Theano/Theano.git@a3bbfb8c#egg=theano

Answer 4 · 2018-04-01T08:16:45.000Z

Sorry I wasn't able to solve the seriazation issues. I'm in the process of refreshing the library including updating the built in models to use tensorflow hub, so hopefully this will work better in the next released version.