dluvizon/deephar

AttributeError in 2D pose estimation

Closed this issue · 6 comments

Hi All,

while running python3 exp/mpii/eval_mpii_singleperson.py output/eval-mpii on ubuntu getting an error.

(venv_project) xxx@xxx-HP-Pavilion-Laptop-15-cs1xxx:~/project_deep$ python3 exp/mpii/eval_mpii_singleperson.py output/eval-mpii
Initializing deephar v.0.4.1
CUDA_VISIBLE_DEVICES not defined
Using TensorFlow backend.
No module named 'mpl_toolkits'
Using keras version "2.1.4"
2019-02-24 18:23:54.237943: I tensorflow/core/platform/cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-02-24 18:23:54.461427: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:898] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-02-24 18:23:54.463161: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1212] Found device 0 with properties:
name: GeForce MX150 major: 6 minor: 1 memoryClockRate(GHz): 1.5315
pciBusID: 0000:02:00.0
totalMemory: 3.95GiB freeMemory: 3.32GiB
2019-02-24 18:23:54.463201: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1312] Adding visible gpu devices: 0
2019-02-24 18:23:54.912986: I tensorflow/core/common_runtime/gpu/gpu_device.cc:993] Creating TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 3044 MB memory) -> physical GPU (device: 0, name: GeForce MX150, pci bus id: 0000:02:00.0, compute capability: 6.1)
Traceback (most recent call last):
File "exp/mpii/eval_mpii_singleperson.py", line 74, in
eval_singleperson_pckh(model, x_val, p_val[:,:,0:2], afmat_val, head_val)
File "/home/xxx/project_deep/exp/common/mpii_tools.py", line 86, in eval_singleperson_pckh
pred = model.predict(inputs, batch_size=batch_size, verbose=1)
File "/home/xxx/.pyenv/versions/venv_project/lib/python3.6/site-packages/keras/engine/training.py", line 1842, in predict
verbose=verbose, steps=steps)
File "/home/xxx/.pyenv/versions/venv_project/lib/python3.6/site-packages/keras/engine/training.py", line 1292, in _predict_loop
stateful_metrics=self.stateful_metric_names)
AttributeError: 'Model' object has no attribute 'stateful_metric_names'

How to fix it ?

Hi @pratishthavrm ,

This seems to be a Keras issue #9394.

thanx a lot

Hi,
while running train_mpii_singleperson.py getting following error.
W tensorflow/core/framework/op_kernel.cc:1431] OP_REQUIRES failed at constant_op.cc:170 : Resource exhausted: OOM when allocating tensor with shape[20,576,32,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
File "/home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1335, in _do_call
return fn(*args)
File "/home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1320, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1408, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[20,576,32,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[{{node SepConv2/batch_normalization_16/FusedBatchNorm}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[loss/concatenate_12_loss/Mean_3/_3811]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/#####/.vscode/extensions/ms-python.python-2019.2.5433/pythonFiles/ptvsd_launcher.py", line 45, in
main(ptvsdArgs)
File "/home/#####/.vscode/extensions/ms-python.python-2019.2.5433/pythonFiles/lib/python/ptvsd/main.py", line 357, in main
run()
File "/home/#####/.vscode/extensions/ms-python.python-2019.2.5433/pythonFiles/lib/python/ptvsd/main.py", line 257, in run_file
runpy.run_path(target, run_name='main')
File "/usr/lib/python3.5/runpy.py", line 254, in run_path
pkg_name=pkg_name, script_name=fname)
File "/usr/lib/python3.5/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/#####/project_deeper/exp/mpii/train_mpii_singleperson.py", line 97, in
initial_epoch=0)
File "/home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/keras/engine/training.py", line 2244, in fit_generator
class_weight=class_weight)
File "/home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/keras/engine/training.py", line 1890, in train_on_batch
outputs = self.train_function(ins)
File "/home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 2475, in call
**self.session_kwargs)
File "/home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 930, in run
run_metadata_ptr)
File "/home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1153, in _run
feed_dict_tensor, options, run_metadata)
File "/home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1329, in _do_run
run_metadata)
File "/home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/tensorflow/python/client/session.py", line 1349, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[20,576,32,32] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
[[node SepConv2/batch_normalization_16/FusedBatchNorm (defined at home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:1799) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

     [[loss/concatenate_12_loss/Mean_3/_3811]]

Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Errors may have originated from an input operation.
Input Source operations connected to node SepConv2/batch_normalization_16/FusedBatchNorm:
SepConv2/separable_conv2d_6/separable_conv2d (defined at home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:3478)
batch_normalization_16/beta/read (defined at home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:392)
SepConv2/batch_normalization_16/Const (defined at home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py:1788)

Original stack trace for 'SepConv2/batch_normalization_16/FusedBatchNorm':
File "home/#####/.vscode/extensions/ms-python.python-2019.2.5433/pythonFiles/ptvsd_launcher.py", line 45, in
main(ptvsdArgs)
File "home/#####/.vscode/extensions/ms-python.python-2019.2.5433/pythonFiles/lib/python/ptvsd/main.py", line 357, in main
run()
File "home/#####/.vscode/extensions/ms-python.python-2019.2.5433/pythonFiles/lib/python/ptvsd/main.py", line 257, in run_file
runpy.run_path(target, run_name='main')
File "usr/lib/python3.5/runpy.py", line 254, in run_path
pkg_name=pkg_name, script_name=fname)
File "usr/lib/python3.5/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "usr/lib/python3.5/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "home/#####/project_deeper/exp/mpii/train_mpii_singleperson.py", line 51, in
num_blocks=num_blocks, num_context_per_joint=2, ksize=(5, 5))
File "home/#####/project_deeper/deephar/models/reception.py", line 285, in build
x = build_sconv_block(x, name='SepConv%d' % (bidx + 1), ksize=ksize)
File "home/#####/project_deeper/deephar/models/reception.py", line 142, in build_sconv_block
return model(inp)
File "home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/keras/engine/topology.py", line 617, in call
output = self.call(inputs, **kwargs)
File "home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/keras/engine/topology.py", line 2081, in call
output_tensors, _, _ = self.run_internal_graph(inputs, masks)
File "home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/keras/engine/topology.py", line 2232, in run_internal_graph
output_tensors = _to_list(layer.call(computed_tensor, **kwargs))
File "home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/keras/layers/normalization.py", line 181, in call
epsilon=self.epsilon)
File "home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1824, in normalize_batch_in_training
epsilon=epsilon)
File "home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/keras/backend/tensorflow_backend.py", line 1799, in _fused_normalize_batch_in_training
data_format=tf_data_format)
File "home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/tensorflow/python/ops/nn_impl.py", line 1206, in fused_batch_norm
name=name)
File "home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 3946, in _fused_batch_norm
name=name)
File "home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/tensorflow/python/framework/op_def_library.py", line 800, in _apply_op_helper
op_def=op_def)
File "home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 3479, in create_op
op_def=op_def)
File "home/#####/virtualenvironment/project_2/lib/python3.5/site-packages/tensorflow/python/framework/ops.py", line 1961, in init
self._traceback = tf_stack.extract_stack()

Please tell me how to fix this.

Please note in your log:

tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM

You need a bigger GPU or smaller batches for training the model.

Thanx a lot
But what is the minimum system requirement for mpii training.

There is no a single answer for that question.
By reducing the batch size you should be able to train it even on small GPUs.