Rank error in function misc.resolve_shape when trying to use flag upsample_logits
kmkajak opened this issue · 2 comments
I am trying to use the repository to do pose estimation on my own dataset.
First of all, everything seems to work fine in check_train_input.py, train.py, eval.py, and infer.py with the following parameters in params.yml:
#Dataset.
dataset: "sphere"
#Model.
model_variant: "xception_65"
atrous_rates: [12, 24, 36]
encoder_output_stride: 8
decoder_output_stride: [4]
upsample_logits: false
frag_seg_agnostic: false
frag_loc_agnostic: false
num_frags: 64
#Establishing correspondences.
corr_min_obj_conf: 0.1
corr_min_frag_rel_conf: 0.5
corr_project_to_model: false
#Training.
train_tfrecord_names: ["sphere_train-blender"]
train_max_height_before_crop: 128
train_crop_size: "128,128"
optimizer: "AdamOptimizer"
save_interval_steps: 10000
initialize_last_layer: false
fine_tune_batch_norm: false
train_steps: 4500000
train_batch_size: 4
base_learning_rate: 0.0001
obj_cls_loss_weight: 1.0
frag_cls_loss_weight: 1.0
frag_loc_loss_weight: 100.0
train_knn_frags: 1
data_augmentations:
random_adjust_brightness:
min_delta: -0.15
max_delta: 0.15
random_adjust_contrast:
min_delta: 0.85
max_delta: 1.15
random_adjust_saturation:
min_delta: 0.85
max_delta: 1.15
random_adjust_hue:
max_delta: 1.0
random_blur:
max_sigma: 1.5
random_gaussian_noise:
max_sigma: 0.03
jpeg_artifacts:
min_quality: 85
However, when I enable the upsample_logits
flag, I get the following error:
Traceback (most recent call last):
File "/home/user/miniconda/envs/eposaidev/lib/python3.8/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 928, in merge_with
self.assert_same_rank(other)
File "/home/user/miniconda/envs/eposaidev/lib/python3.8/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 982, in assert_same_rank
raise ValueError("Shapes %s and %s must have the same rank" %
ValueError: Shapes (?, 128, 128) and (?, ?, ?, ?) must have the same rank
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/user/miniconda/envs/eposaidev/lib/python3.8/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 1013, in with_rank
return self.merge_with(unknown_shape(rank=rank))
File "/home/user/miniconda/envs/eposaidev/lib/python3.8/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 934, in merge_with
raise ValueError("Shapes %s and %s are not compatible" % (self, other))
ValueError: Shapes (?, 128, 128) and (?, ?, ?, ?) are not compatible
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "train.py", line 584, in <module>
tf.app.run()
File "/home/user/miniconda/envs/eposaidev/lib/python3.8/site-packages/tensorflow_core/python/platform/app.py", line 40, in run
_run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
File "/home/user/miniconda/envs/eposaidev/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/user/miniconda/envs/eposaidev/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "train.py", line 505, in main
train_tensor, summary_op = _train_epos_model(
File "train.py", line 374, in _train_epos_model
loss = _tower_loss(
File "train.py", line 285, in _tower_loss
_build_epos_model(
File "train.py", line 202, in _build_epos_model
loss.add_obj_cls_loss(
File "/home/user/phd/epos/epos_lib/loss.py", line 131, in add_obj_cls_loss
targets_shape = misc.resolve_shape(targets, 4)[1:3]
File "/home/user/phd/epos/epos_lib/misc.py", line 44, in resolve_shape
shape = tensor.get_shape().with_rank(rank).as_list()
File "/home/user/miniconda/envs/eposaidev/lib/python3.8/site-packages/tensorflow_core/python/framework/tensor_shape.py", line 1015, in with_rank
raise ValueError("Shape %s must have rank %d" % (self, rank))
ValueError: Shape (?, 128, 128) must have rank 4
I tried multiple sources of data, including the tfrecord file ycbv_test_targets-bop19.tfrecord
provided by the authors, so I at least have some confidence that the data is not the source of the issue. However, I am not an in-depth expert on this repository and have not yet traced the entire path of the data through the code up until this point of failure.
Any clues or insights as to what the shapes should be like at this point of failure? Appreciate the help.
Hi @kmkajak, as far as I remember, I always kept upsample_logits
set to False
, so unfortunately I am not familiar with this issue. Is it necessary to upsample the logits in your case?
Hi @thodan , I have a dataset where the object appears at high range variations, meaning it can be 5 - 30 meters from the camera, which can cause the target to only show a small amount of pixels on the output side. I was hoping that maybe I would not have to use an object detector to solve this problem and therefore I have been playing with hyperparameters to see what can be achieved without it. Upsampling to input image size seems to solve some of that as post-processing then has more information to work with.
I managed to get upsampling to work now by fixing the resizing commands to the appropriate rank and selecting the appropriate dimensions. However, I have yet to verify conclusively if the results are indeed correct. It SEEMS to work and a test run of training SEEMS to verify the changes.
The changes:
- in the function
add_obj_cls_loss
within moduleloss
:
if upsample_logits:
targets_shape = misc.resolve_shape(targets,3)[1:3]
- in the function
add_frag_cls_loss
within moduleloss
:
if upsample_logits:
logits = resize_logits(logits, shape[1:3])
- in the function
add_frag_loc_loss
within moduleloss
:
if upsample_logits:
logits = resize_logits(logits, shape[1:3])
In conclusion, within add_obj_cls_loss
the wrong rank (originally 4, now 3) was specified, and within all three loss functions too many dimensions were previously selected for resizing the feature maps.