Doodleverse/segmentation_gym

Tensor Shape error when mixing Greyscale + Color images

sbosse12 opened this issue · 7 comments

Hi all,
When I attempt to train a model classifying oblique coastline imagery into three classes (water, land, sky), I receive this error below:

Epoch 00001: LearningRateScheduler setting learning rate to 1e-07.
Traceback (most recent call last):
File "X:\Imagery\CamerasOfOpportunity\2022_Ian_Doodleverse\segmentation_gym\train_model.py", line 760, in
history = model.fit(train_ds, steps_per_epoch=steps_per_epoch, epochs=MAX_EPOCHS,
File "C:\ProgramData\Anaconda3\envs\gym\lib\site-packages\tensorflow\python\keras\engine\training.py", line 1193, in fit
tmp_logs = self.train_function(iterator)
File "C:\ProgramData\Anaconda3\envs\gym\lib\site-packages\tensorflow\python\eager\def_function.py", line 885, in call
result = self._call(*args, **kwds)
File "C:\ProgramData\Anaconda3\envs\gym\lib\site-packages\tensorflow\python\eager\def_function.py", line 950, in _call
return self._stateless_fn(*args, **kwds)
File "C:\ProgramData\Anaconda3\envs\gym\lib\site-packages\tensorflow\python\eager\function.py", line 3039, in call
return graph_function._call_flat(
File "C:\ProgramData\Anaconda3\envs\gym\lib\site-packages\tensorflow\python\eager\function.py", line 1963, in _call_flat
return self._build_call_outputs(self._inference_function.call(
File "C:\ProgramData\Anaconda3\envs\gym\lib\site-packages\tensorflow\python\eager\function.py", line 591, in call
outputs = execute.execute(
File "C:\ProgramData\Anaconda3\envs\gym\lib\site-packages\tensorflow\python\eager\execute.py", line 59, in quick_execute
tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Cannot batch tensors with different shapes in component 0. First element had shape [768,1024,1] and element 1 had shape [768,1024,3].
[[node IteratorGetNext (defined at X:\Imagery\CamerasOfOpportunity\2022_Ian_Doodleverse\segmentation_gym\train_model.py:760) ]]
(1) Invalid argument: Cannot batch tensors with different shapes in component 0. First element had shape [768,1024,1] and element 1 had shape [768,1024,3].
[[node IteratorGetNext (defined at X:\Imagery\CamerasOfOpportunity\2022_Ian_Doodleverse\segmentation_gym\train_model.py:760) ]]
[[Shape/_6]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_13417]

Function call stack:
train_function -> train_function

Are y'all familiar with this?

Here is my config file for reference as well as some test images/labels
Ian_3class_test.zip

"TARGET_SIZE": [768, 1024],
"MODEL": "resunet",
"NCLASSES": 3,
"BATCH_SIZE": 7,
"N_DATA_BANDS": 3,
"DO_TRAIN": true,
"PATIENCE": 10,
"MAX_EPOCHS": 100,
"VALIDATION_SPLIT": 0.6,
"FILTERS":6,
"KERNEL":9,
"STRIDE":2,
"LOSS": "dice",
"DROPOUT":0.1,
"DROPOUT_CHANGE_PER_LAYER":0.0,
"DROPOUT_TYPE":"standard",
"USE_DROPOUT_ON_UPSAMPLING":false,
"ROOT_STRING": "Hurr_Ian_water_mask",
"FILTER_VALUE": 0,
"DOPLOT": true,
"USEMASK": false,
"RAMPUP_EPOCHS": 20,
"SUSTAIN_EPOCHS": 0.0,
"EXP_DECAY": 0.9,
"START_LR": 1e-7,
"MIN_LR": 1e-7,
"MAX_LR": 1e-4,
"AUG_ROT": 5,
"AUG_ZOOM": 0.05,
"AUG_WIDTHSHIFT": 0.05,
"AUG_HEIGHTSHIFT": 0.05,
"AUG_HFLIP": true,
"AUG_VFLIP": false,
"AUG_LOOPS": 10,
"AUG_COPIES": 5,
"TESTTIMEAUG": false,
"SET_GPU": "0",
"do_crf": true,
"SET_PCI_BUS_ID": true

this is a Tensor shape error:

Cannot batch tensors with different shapes in component 0. First element had shape [768,1024,1] and element 1 had shape [768,1024,3].

It seems like your images might be a mix of greyscale and color?

Note that the config asks for "N_DATA_BANDS" and expects that to hold for all the images.

The easiest thing to do here is try to make a model using either grayscale or color (or you could convert all greyscale -> color.. or vice versa... then run makedatasets again, and train again)...

does this make sense?

yes, that does make sense! Good to know. I was thinking it was an error that occurred during the make_nd_datasets phase. I'll convert, try again and report back.

Thanks Evan, we're running now!

nice!

Hey folks, I get the same error message but I do not have grayscale images in my datsests. All RGB JPEGs. I first noticed this issue and with trial and error I found that it would work on a datasets of 100 images, but not >100 images. Then I returned to this a month later and then my 100-image dataset (literally the same files) no longer worked (but I had adjusted the batch size in the config file, the only change!). I brought the image number down to 88 and it worked.

Fresh update of Gym, happened before and after (early November and as we speak).

My only wonder is if it's not the size of the datasets at all but I"m removing offending images, but like I said no mixing of image type.

@jmdelvecchio - i think you are right that there are offending images you could add them in groups to find the problems. Also just keep in mind that if you adjust batch size, you need to rerun makedatasets...

Also, do you want to keep discussing this? or reopen this issue or make a new one?

feel free to drop a config file in here, and even send us a link (via email) of the zipped images & labels...

I re-ran makedatasets; also just looked over potential "offending" images (I removed 12 images from a single AOI) and nothing strikes me as wrong so perhaps a new issue since it's not greyscale. I'll go and make one now.