google-research/deeplab2

TypeError: Failed to convert object of type <class 'tuple'> to Tensor. Contents: (None, 1, 1, 1)

davidblom603 opened this issue · 3 comments

The function deeplab2/model/layers/drop_path.py:78 throws a TypeError. What I find very confusing, is that it works if the crop size or batch size is changed.

[stderr]I0128 07:15:55.863254 140012254295872 train.py:106] Reading the config file.
[stderr]I0128 07:15:55.867749 140012254295872 train.py:87] Download checkpoint https://storage.googleapis.com/gresearch/tf-deeplab/checkpoint/swidernet_sac_1_1_4.5_os16_panoptic_deeplab_cityscapes_trainfine.tar.gz
[stderr]I0128 07:15:55.871759 140012254295872 connectionpool.py:975] Starting new HTTPS connection (1): storage.googleapis.com:443
[stderr]I0128 07:15:56.443871 140012254295872 connectionpool.py:461] https://storage.googleapis.com:443 "GET /gresearch/tf-deeplab/checkpoint/swidernet_sac_1_1_4.5_os16_panoptic_deeplab_cityscapes_trainfine.tar.gz HTTP/1.1" 200 5988562150
[stderr]I0128 07:18:34.766288 140012254295872 train.py:119] Starting the experiment.
[stderr]2022-01-28 07:18:34.768104: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA
[stderr]To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
[stderr]2022-01-28 07:18:37.764929: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 14655 MB memory:  -> device: 0, name: Tesla V100-PCIE-16GB, pci bus id: 2d94:00:00.0, compute capability: 7.0
[stderr]2022-01-28 07:18:37.766537: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 14655 MB memory:  -> device: 1, name: Tesla V100-PCIE-16GB, pci bus id: 434f:00:00.0, compute capability: 7.0
[stderr]2022-01-28 07:18:37.767908: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 14655 MB memory:  -> device: 2, name: Tesla V100-PCIE-16GB, pci bus id: 5a33:00:00.0, compute capability: 7.0
[stderr]2022-01-28 07:18:37.769792: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 14655 MB memory:  -> device: 3, name: Tesla V100-PCIE-16GB, pci bus id: 6ea8:00:00.0, compute capability: 7.0
[stderr]INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3')
[stderr]I0128 07:18:38.373827 140012254295872 mirrored_strategy.py:369] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0', '/job:localhost/replica:0/task:0/device:GPU:1', '/job:localhost/replica:0/task:0/device:GPU:2', '/job:localhost/replica:0/task:0/device:GPU:3')
[stderr]I0128 07:18:38.375211 140012254295872 train_lib.py:105] Using strategy <class 'tensorflow.python.distribute.mirrored_strategy.MirroredStrategy'> with 4 replicas
[stderr]I0128 07:18:38.389083 140012254295872 deeplab.py:57] Synchronized Batchnorm is used.
[stderr]I0128 07:18:38.390186 140012254295872 axial_resnet_instances.py:144] Axial-ResNet final config: {'num_blocks': [3, 6, 3, 3], 'backbone_layer_multiplier': 4.5, 'width_multiplier': 1.0, 'stem_width_multiplier': 1.0, 'output_stride': 16, 'classification_mode': True, 'backbone_type': 'wider_resnet', 'use_axial_beyond_stride': 0, 'backbone_use_transformer_beyond_stride': 0, 'extra_decoder_use_transformer_beyond_stride': 32, 'backbone_decoder_num_stacks': 0, 'backbone_decoder_blocks_per_stage': 1, 'extra_decoder_num_stacks': 0, 'extra_decoder_blocks_per_stage': 1, 'max_num_mask_slots': 128, 'num_mask_slots': 128, 'memory_channels': 256, 'base_transformer_expansion': 1.0, 'global_feed_forward_network_channels': 256, 'high_resolution_output_stride': 4, 'activation': 'relu', 'block_group_config': {'attention_bottleneck_expansion': 2, 'drop_path_keep_prob': 0.800000011920929, 'drop_path_beyond_stride': 4, 'drop_path_schedule': 'linear', 'positional_encoding_type': None, 'use_global_beyond_stride': 0, 'use_sac_beyond_stride': 32, 'use_squeeze_and_excite': False, 'conv_use_recompute_grad': True, 'axial_use_recompute_grad': True, 'recompute_within_stride': 0, 'transformer_use_recompute_grad': False, 'axial_layer_config': {'query_shape': (129, 129), 'key_expansion': 1, 'value_expansion': 2, 'memory_flange': (32, 32), 'double_global_attention': False, 'num_heads': 8, 'use_query_rpe_similarity': True, 'use_key_rpe_similarity': True, 'use_content_similarity': True, 'retrieve_value_rpe': True, 'retrieve_value_content': True, 'initialization_std_for_query_key_rpe': 1.0, 'initialization_std_for_value_rpe': 1.0, 'self_attention_activation': 'softmax'}, 'dual_path_transformer_layer_config': {'num_heads': 8, 'bottleneck_expansion': 2, 'key_expansion': 1, 'value_expansion': 2, 'feed_forward_network_channels': 2048, 'use_memory_self_attention': True, 'use_pixel2memory_feedback_attention': True, 'transformer_activation': 'softmax'}}, 'bn_layer': functools.partial(<class 'keras.layers.normalization.batch_normalization.SyncBatchNormalization'>, momentum=0.9900000095367432, epsilon=0.0010000000474974513), 'conv_kernel_weight_decay': 0.0}
[stderr]I0128 07:18:39.049323 140012254295872 deeplab.py:96] Setting pooling size to (65, 129)
[stderr]I0128 07:18:39.049559 140012254295872 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
[stderr]I0128 07:18:39.049677 140012254295872 aspp.py:135] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
[stderr]INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[stderr]I0128 07:18:44.511644 140012254295872 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[stderr]INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[stderr]I0128 07:18:44.513136 140012254295872 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[stderr]INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[stderr]I0128 07:18:44.517010 140012254295872 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[stderr]INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[stderr]I0128 07:18:44.518056 140012254295872 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[stderr]INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[stderr]I0128 07:18:44.522685 140012254295872 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[stderr]INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[stderr]I0128 07:18:44.523885 140012254295872 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[stderr]INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[stderr]I0128 07:18:44.527496 140012254295872 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[stderr]INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[stderr]I0128 07:18:44.528717 140012254295872 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[stderr]INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[stderr]I0128 07:18:44.533386 140012254295872 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[stderr]INFO:tensorflow:Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[stderr]I0128 07:18:44.534450 140012254295872 cross_device_ops.py:621] Reduce to /job:localhost/replica:0/task:0/device:CPU:0 then broadcast to ('/job:localhost/replica:0/task:0/device:CPU:0',).
[stderr]I0128 07:18:44.552655 140012254295872 controller.py:395] restoring or initializing model...
[stderr]I0128 07:18:44.659193 140012254295872 controller.py:401] initialized model.
[stderr]I0128 07:18:45.906069 140012254295872 api.py:446] Eval with scales ListWrapper([1.0])
[stderr]I0128 07:18:47.215825 140012254295872 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
[stderr]I0128 07:18:47.244710 140012254295872 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
[stderr]I0128 07:18:47.272035 140012254295872 api.py:446] Eval scale 1.0; setting pooling size to [65, 129]
[stderr]I0128 07:19:05.988396 140012254295872 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
[stderr]I0128 07:19:06.017653 140012254295872 api.py:446] Global average pooling in the ASPP pooling layer was replaced with tiled average pooling using the provided pool_size. Please make sure this behavior is intended.
[stderr]I0128 07:19:11.751652 140012254295872 controller.py:492] saved checkpoint to /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/cap/data-capability/wd/output_d5d1c53e_workspaceblobstore/air_flange_panoptic_segmentation/ckpt-0.
[stderr]I0128 07:19:11.752440 140012254295872 controller.py:237] train | step:      0 | training until step 50000...
[stderr]2022-01-28 07:19:11.881448: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
[stderr]INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
[stderr]I0128 07:19:12.539248 140012254295872 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
[stderr]INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
[stderr]I0128 07:19:12.583310 140012254295872 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
[stderr]INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
[stderr]I0128 07:19:12.626576 140012254295872 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
[stderr]INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
[stderr]I0128 07:19:13.219306 140012254295872 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
[stderr]INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
[stderr]I0128 07:19:13.262344 140012254295872 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
[stderr]INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
[stderr]I0128 07:19:13.304924 140012254295872 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
[stderr]INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
[stderr]I0128 07:19:13.542769 140012254295872 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
[stderr]INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
[stderr]I0128 07:19:13.586726 140012254295872 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
[stderr]INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
[stderr]I0128 07:19:13.630722 140012254295872 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
[stderr]INFO:tensorflow:batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
[stderr]I0128 07:19:13.883121 140012254295872 cross_device_ops.py:903] batch_all_reduce: 1 all-reduces with algorithm = nccl, num_packs = 1
[stderr]INFO:tensorflow:Error reported to Coordinator: in user code:
[stderr]
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/deeplab.py:135 call  *
[stderr]        result_dict = self._decoder(
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/encoder/axial_resnet.py:761 call  *
[stderr]        current_output, activated_output, memory_feature, endpoints = (
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/encoder/axial_resnet.py:560 call_encoder_before_stacked_decoder  *
[stderr]        current_output, activated_output, memory_feature = (
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/layers/axial_block_groups.py:366 call  *
[stderr]        drop_path_random_mask = drop_path.generate_drop_path_random_mask(
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/layers/drop_path.py:78 generate_drop_path_random_mask  *
[stderr]        random_tensor += tf.random.uniform(
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py:206 wrapper  **
[stderr]        return target(*args, **kwargs)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/ops/random_ops.py:296 random_uniform
[stderr]        shape = tensor_util.shape_tensor(shape)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py:1080 shape_tensor
[stderr]        return ops.convert_to_tensor(shape, dtype=dtype, name="shape")
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/profiler/trace.py:163 wrapped
[stderr]        return func(*args, **kwargs)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1566 convert_to_tensor
[stderr]        ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py:346 _constant_tensor_conversion_function
[stderr]        return constant(v, dtype=dtype, name=name)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py:272 constant
[stderr]        allow_broadcast=True)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py:290 _constant_impl
[stderr]        allow_broadcast=allow_broadcast))
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py:553 make_tensor_proto
[stderr]        "supported type." % (type(values), values))
[stderr]
[stderr]    TypeError: Failed to convert object of type <class 'tuple'> to Tensor. Contents: (None, 1, 1, 1). Consider casting elements to a supported type.
[stderr]Traceback (most recent call last):
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception
[stderr]    yield
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/distribute/mirrored_run.py", line 346, in run
[stderr]    self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
[stderr]  File "/tmp/tmpalp69jov.py", line 13, in step_fn
[stderr]    ag__.converted_call(ag__.ld(self)._train_step, (ag__.ld(inputs),), None, fscope_1)
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 446, in converted_call
[stderr]    result = converted_f(*effective_args)
[stderr]  File "/tmp/tmp89who94l.py", line 10, in tf___train_step
[stderr]    outputs = ag__.converted_call(ag__.ld(self)._model, (ag__.ld(inputs)[ag__.ld(common).IMAGE],), dict(training=True), fscope)
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 382, in converted_call
[stderr]    return _call_unconverted(f, args, kwargs, options)
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 463, in _call_unconverted
[stderr]    return f(*args, **kwargs)
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/keras/engine/base_layer.py", line 1037, in __call__
[stderr]    outputs = call_fn(inputs, *args, **kwargs)
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 695, in wrapper
[stderr]    raise e.ag_error_metadata.to_exception(e)
[stderr]TypeError: in user code:
[stderr]
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/deeplab.py:135 call  *
[stderr]        result_dict = self._decoder(
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/encoder/axial_resnet.py:761 call  *
[stderr]        current_output, activated_output, memory_feature, endpoints = (
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/encoder/axial_resnet.py:560 call_encoder_before_stacked_decoder  *
[stderr]        current_output, activated_output, memory_feature = (
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/layers/axial_block_groups.py:366 call  *
[stderr]        drop_path_random_mask = drop_path.generate_drop_path_random_mask(
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/layers/drop_path.py:78 generate_drop_path_random_mask  *
[stderr]        random_tensor += tf.random.uniform(
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py:206 wrapper  **
[stderr]        return target(*args, **kwargs)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/ops/random_ops.py:296 random_uniform
[stderr]        shape = tensor_util.shape_tensor(shape)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py:1080 shape_tensor
[stderr]        return ops.convert_to_tensor(shape, dtype=dtype, name="shape")
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/profiler/trace.py:163 wrapped
[stderr]        return func(*args, **kwargs)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1566 convert_to_tensor
[stderr]        ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py:346 _constant_tensor_conversion_function
[stderr]        return constant(v, dtype=dtype, name=name)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py:272 constant
[stderr]        allow_broadcast=True)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py:290 _constant_impl
[stderr]        allow_broadcast=allow_broadcast))
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py:553 make_tensor_proto
[stderr]        "supported type." % (type(values), values))
[stderr]
[stderr]    TypeError: Failed to convert object of type <class 'tuple'> to Tensor. Contents: (None, 1, 1, 1). Consider casting elements to a supported type.
[stderr]
[stderr]I0128 07:19:17.038703 139932601390848 coordinator.py:219] Error reported to Coordinator: in user code:
[stderr]
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/deeplab.py:135 call  *
[stderr]        result_dict = self._decoder(
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/encoder/axial_resnet.py:761 call  *
[stderr]        current_output, activated_output, memory_feature, endpoints = (
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/encoder/axial_resnet.py:560 call_encoder_before_stacked_decoder  *
[stderr]        current_output, activated_output, memory_feature = (
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/layers/axial_block_groups.py:366 call  *
[stderr]        drop_path_random_mask = drop_path.generate_drop_path_random_mask(
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/layers/drop_path.py:78 generate_drop_path_random_mask  *
[stderr]        random_tensor += tf.random.uniform(
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py:206 wrapper  **
[stderr]        return target(*args, **kwargs)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/ops/random_ops.py:296 random_uniform
[stderr]        shape = tensor_util.shape_tensor(shape)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py:1080 shape_tensor
[stderr]        return ops.convert_to_tensor(shape, dtype=dtype, name="shape")
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/profiler/trace.py:163 wrapped
[stderr]        return func(*args, **kwargs)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1566 convert_to_tensor
[stderr]        ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py:346 _constant_tensor_conversion_function
[stderr]        return constant(v, dtype=dtype, name=name)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py:272 constant
[stderr]        allow_broadcast=True)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py:290 _constant_impl
[stderr]        allow_broadcast=allow_broadcast))
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py:553 make_tensor_proto
[stderr]        "supported type." % (type(values), values))
[stderr]
[stderr]    TypeError: Failed to convert object of type <class 'tuple'> to Tensor. Contents: (None, 1, 1, 1). Consider casting elements to a supported type.
[stderr]Traceback (most recent call last):
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/training/coordinator.py", line 297, in stop_on_exception
[stderr]    yield
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/distribute/mirrored_run.py", line 346, in run
[stderr]    self.main_result = self.main_fn(*self.main_args, **self.main_kwargs)
[stderr]  File "/tmp/tmpalp69jov.py", line 13, in step_fn
[stderr]    ag__.converted_call(ag__.ld(self)._train_step, (ag__.ld(inputs),), None, fscope_1)
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 446, in converted_call
[stderr]    result = converted_f(*effective_args)
[stderr]  File "/tmp/tmp89who94l.py", line 10, in tf___train_step
[stderr]    outputs = ag__.converted_call(ag__.ld(self)._model, (ag__.ld(inputs)[ag__.ld(common).IMAGE],), dict(training=True), fscope)
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 382, in converted_call
[stderr]    return _call_unconverted(f, args, kwargs, options)
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 463, in _call_unconverted
[stderr]    return f(*args, **kwargs)
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/keras/engine/base_layer.py", line 1037, in __call__
[stderr]    outputs = call_fn(inputs, *args, **kwargs)
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 695, in wrapper
[stderr]    raise e.ag_error_metadata.to_exception(e)
[stderr]TypeError: in user code:
[stderr]
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/deeplab.py:135 call  *
[stderr]        result_dict = self._decoder(
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/encoder/axial_resnet.py:761 call  *
[stderr]        current_output, activated_output, memory_feature, endpoints = (
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/encoder/axial_resnet.py:560 call_encoder_before_stacked_decoder  *
[stderr]        current_output, activated_output, memory_feature = (
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/layers/axial_block_groups.py:366 call  *
[stderr]        drop_path_random_mask = drop_path.generate_drop_path_random_mask(
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/layers/drop_path.py:78 generate_drop_path_random_mask  *
[stderr]        random_tensor += tf.random.uniform(
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py:206 wrapper  **
[stderr]        return target(*args, **kwargs)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/ops/random_ops.py:296 random_uniform
[stderr]        shape = tensor_util.shape_tensor(shape)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py:1080 shape_tensor
[stderr]        return ops.convert_to_tensor(shape, dtype=dtype, name="shape")
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/profiler/trace.py:163 wrapped
[stderr]        return func(*args, **kwargs)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1566 convert_to_tensor
[stderr]        ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py:346 _constant_tensor_conversion_function
[stderr]        return constant(v, dtype=dtype, name=name)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py:272 constant
[stderr]        allow_broadcast=True)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py:290 _constant_impl
[stderr]        allow_broadcast=allow_broadcast))
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py:553 make_tensor_proto
[stderr]        "supported type." % (type(values), values))
[stderr]
[stderr]    TypeError: Failed to convert object of type <class 'tuple'> to Tensor. Contents: (None, 1, 1, 1). Consider casting elements to a supported type.
[stderr]
[stderr]Traceback (most recent call last):
[stderr]  File "deeplab2/trainer/train.py", line 126, in <module>
[stderr]    app.run(main)
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/absl/app.py", line 300, in run
[stderr]    _run_main(main, args)
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
[stderr]    sys.exit(main(argv))
[stderr]  File "deeplab2/trainer/train.py", line 122, in main
[stderr]    FLAGS.num_gpus)
[stderr]  File "/mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/trainer/train_lib.py", line 191, in run_experiment
[stderr]    steps=config.trainer_options.solver_options.training_number_of_steps)
[stderr]  File "/mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/orbit/controller.py", line 241, in train
[stderr]    self._train_n_steps(num_steps)
[stderr]  File "/mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/orbit/controller.py", line 443, in _train_n_steps
[stderr]    train_output = self.trainer.train(num_steps_tensor)
[stderr]  File "/mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/orbit/standard_runner.py", line 146, in train
[stderr]    self._train_loop_fn(self._train_iter, num_steps)
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 885, in __call__
[stderr]    result = self._call(*args, **kwds)
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 933, in _call
[stderr]    self._initialize(args, kwds, add_initializers_to=initializers)
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 760, in _initialize
[stderr]    *args, **kwds))
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3066, in _get_concrete_function_internal_garbage_collected
[stderr]    graph_function, _ = self._maybe_define_function(args, kwargs)
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3463, in _maybe_define_function
[stderr]    graph_function = self._create_graph_function(args, kwargs)
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/eager/function.py", line 3308, in _create_graph_function
[stderr]    capture_by_value=self._capture_by_value),
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 1007, in func_graph_from_py_func
[stderr]    func_outputs = python_func(*func_args, **func_kwargs)
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/eager/def_function.py", line 668, in wrapped_fn
[stderr]    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
[stderr]  File "/azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/func_graph.py", line 994, in wrapper
[stderr]    raise e.ag_error_metadata.to_exception(e)
[stderr]TypeError: in user code:
[stderr]
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/orbit/utils/loop_fns.py:118 loop_fn  *
[stderr]        step_fn(iterator)
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/trainer/trainer.py:217 step_fn  *
[stderr]        self._train_step(inputs)
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/trainer/trainer.py:229 _train_step  *
[stderr]        outputs = self._model(inputs[common.IMAGE], training=True)
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/deeplab.py:135 call  *
[stderr]        result_dict = self._decoder(
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/encoder/axial_resnet.py:761 call  *
[stderr]        current_output, activated_output, memory_feature, endpoints = (
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/encoder/axial_resnet.py:560 call_encoder_before_stacked_decoder  *
[stderr]        current_output, activated_output, memory_feature = (
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/layers/axial_block_groups.py:366 call  *
restoring or initializing model...
loading initial checkpoint initial_checkpoint/swidernet_sac_1_1_4.5_os16_panoptic_deeplab_cityscapes_trainfine/ckpt-60000
load initial checkpoint initial_checkpoint/swidernet_sac_1_1_4.5_os16_panoptic_deeplab_cityscapes_trainfine/ckpt-60000
initialized model.
saved checkpoint to /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/cap/data-capability/wd/output_d5d1c53e_workspaceblobstore/air_flange_panoptic_segmentation/ckpt-0.
train | step:      0 | training until step 50000...
Cleaning up all outstanding Run operations, waiting 300.0 seconds
0 items cleaning up...
Cleanup took 7.152557373046875e-07 seconds
[stderr]        drop_path_random_mask = drop_path.generate_drop_path_random_mask(
[stderr]    /mnt/azureml/cr/j/16e36723e2274cd7ba286426bf351baf/exe/wd/deeplab2/model/layers/drop_path.py:78 generate_drop_path_random_mask  *
[stderr]        random_tensor += tf.random.uniform(
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/util/dispatch.py:206 wrapper  **
[stderr]        return target(*args, **kwargs)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/ops/random_ops.py:296 random_uniform
[stderr]        shape = tensor_util.shape_tensor(shape)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py:1080 shape_tensor
[stderr]        return ops.convert_to_tensor(shape, dtype=dtype, name="shape")
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/profiler/trace.py:163 wrapped
[stderr]        return func(*args, **kwargs)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/ops.py:1566 convert_to_tensor
[stderr]        ret = conversion_func(value, dtype=dtype, name=name, as_ref=as_ref)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py:346 _constant_tensor_conversion_function
[stderr]        return constant(v, dtype=dtype, name=name)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py:272 constant
[stderr]        allow_broadcast=True)
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/constant_op.py:290 _constant_impl
[stderr]        allow_broadcast=allow_broadcast))
[stderr]    /azureml-envs/pytorch-1.9/lib/python3.7/site-packages/tensorflow/python/framework/tensor_util.py:553 make_tensor_proto
[stderr]        "supported type." % (type(values), values))
[stderr]
[stderr]    TypeError: Failed to convert object of type <class 'tuple'> to Tensor. Contents: (None, 1, 1, 1). Consider casting elements to a supported type.
[stderr]
[stderr]

Here is my experiment config file:

# proto-file: deeplab2/config.proto
# proto-message: ExperimentOptions
#
# Panoptic-DeepLab with ResNet-50-beta model variant and output stride 32.
#
############### PLEASE READ THIS BEFORE USING THIS CONFIG ###############
# Before using this config, you need to update the following fields:
# - experiment_name: Use a unique experiment name for each experiment.
# - initial_checkpoint: Update the path to the initial checkpoint.
# - train_dataset_options.file_pattern: Update the path to the
#   training set. e.g., your_dataset/train*.tfrecord
# - eval_dataset_options.file_pattern: Update the path to the
#   validation set, e.g., your_dataset/eval*.tfrecord
# - (optional) set merge_semantic_and_instance_with_tf_op: true, if you
#   could successfully compile the provided efficient merging operation
#   under the folder `tensorflow_ops`.
#########################################################################
#
# The `resnet50_beta` model variant replaces the first 7x7 convolutions in the
# original `resnet50` with three 3x3 convolutions, which is useful for dense
# prediction tasks.
#
# References:
# For resnet-50-beta, see
# https://github.com/tensorflow/models/blob/master/research/deeplab/core/resnet_v1_beta.py
# For Panoptic-DeepLab, see
# - Bowen Cheng, et al. "Panoptic-DeepLab: A Simple, Strong, and Fast Baseline
#   for Bottom-Up Panoptic Segmentation." In CVPR, 2020.

# Use a unique experiment_name for each experiment.
experiment_name: "air_flange_panoptic_segmentation"
model_options {
  # Update the path to the initial checkpoint (e.g., ImageNet
  # pretrained checkpoint).
  initial_checkpoint: "initial_checkpoint/swidernet_sac_1_1_4.5_os16_panoptic_deeplab_cityscapes_trainfine/ckpt-60000"

  # Set whether to restore the last convolution of the semantic head when
  # loading from the initial checkpoint. Setting this flag to false is useful
  # when an initial checkpoint was trained on a dataset with different classes.
  restore_semantic_last_layer_from_initial_checkpoint: false

  # Set whether to restore the last convolution of the instance heads when
  # loading from the initial checkpoint. Depending on the meta architecture,
  # this includes center heatmap, center regression and motion regression.
  restore_instance_last_layer_from_initial_checkpoint: false

  backbone {
    name: "swidernet"
    output_stride: 16
    stem_width_multiplier: 1
    backbone_width_multiplier: 1
    backbone_layer_multiplier: 4.5
    use_sac_beyond_stride: 32
    drop_path_keep_prob: 0.8
    drop_path_schedule: "linear"
  }
  decoder {
    feature_key: "res5"
    decoder_channels: 256
    aspp_channels: 256
    atrous_rates: 6
    atrous_rates: 12
    atrous_rates: 18
  }
  panoptic_deeplab {
    low_level {
      feature_key: "res3"
      channels_project: 64
    }
    low_level {
      feature_key: "res2"
      channels_project: 32
    }
    instance {
      low_level_override {
        feature_key: "res3"
        channels_project: 32
      }
      low_level_override {
        feature_key: "res2"
        channels_project: 16
      }
      instance_decoder_override {
        feature_key: "res5"
        decoder_channels: 128
        atrous_rates: 6
        atrous_rates: 12
        atrous_rates: 18
      }
      center_head {
        output_channels: 1
        head_channels: 32
      }
      regression_head {
        output_channels: 2
        head_channels: 32
      }
    }
    semantic_head {
      output_channels: 2
      head_channels: 256
    }
  }
}
trainer_options {
  save_checkpoints_steps: 1000
  save_summaries_steps: 100
  steps_per_loop: 100
  loss_options {
    semantic_loss {
      name: "softmax_cross_entropy"
      weight: 1.0
      top_k_percent: 0.2
    }
    center_loss {
      name: "mse"
      weight: 200
    }
    regression_loss {
      name: "l1"
      weight: 0.01
    }
  }
  solver_options {
    base_learning_rate: 0.00025
    training_number_of_steps: 50000
  }
}
train_dataset_options {
  dataset: "air_flange_panoptic"
  # Update the path to training set.
  file_pattern: "train*.tfrecord"
  # Adjust the batch_size accordingly to better fit your GPU/TPU memory.
  # Also see Q1 in g3doc/faq.md.
  batch_size: 2
  crop_size: 1025
  crop_size: 2049
  min_resize_value: 0
  max_resize_value: 0
  increase_small_instance_weights: true
  small_instance_weight: 3.0
}
eval_dataset_options {
  dataset: "air_flange_panoptic"
  # Update the path to validation set.
  file_pattern: "val*.tfrecord"
  batch_size: 1
  crop_size: 1025
  crop_size: 2049
  min_resize_value: 0
  max_resize_value: 0
  # Add options to make the evaluation loss comparable to the training loss.
  increase_small_instance_weights: true
  small_instance_weight: 3.0
}
evaluator_options {
  continuous_eval_timeout: 43200
  stuff_area_limit: 2048
  center_score_threshold: 0.1
  nms_kernel: 13
  save_predictions: true
  save_raw_predictions: false
  # Use pure tf functions (i.e., no CUDA kernel) to merge semantic and
  # instance maps. For faster speed, compile TensorFlow with provided kernel
  # implementation under the folder `tensorflow_ops`, and set
  # merge_semantic_and_instance_with_tf_op to true.
  merge_semantic_and_instance_with_tf_op: false
  eval_interval: 1000
}

Hi @davidblom603,

Thanks for the issue.
It is not clear to us what is happening.
Could you please first verify the code runs on the academic dataset (e.g., Cityscapes)?

Cheers,

Hi @davidblom603,

It has been a while, and we hope you have figured out the issue.
Closing the issue now due to lack of activity.
But, please feel free to reopen it if you encounter any other issue.

Cheers,