Doodleverse/segmentation_gym

If resized folders exist (but are empty) there is no resizing (i.e., they are not populated)

ebgoldstein opened this issue · 4 comments

deep in the issue #92 thread, there seems to be a little hiccup in make_datasets when the resized folders (resized_images and resized_labels):

if these directories exist, but are empty (or not filled with the correct images), then the whole resized operation is skipped.

# if directories already exist, skip them
if os.path.isdir(newdireclabels):
print("{} already exists: skipping the image resizing step".format(newdireclabels))
else:
try:
os.mkdir(newdireclabels)
except:
pass
if len(W)==1:
try:
w = Parallel(n_jobs=-2, verbose=0, max_nbytes=None)(delayed(do_resize_image)(os.path.normpath(f), TARGET_SIZE) for f in files)
except:
w = Parallel(n_jobs=-2, verbose=0, max_nbytes=None)(delayed(do_resize_image)(os.path.normpath(f), TARGET_SIZE) for f in files.squeeze())
w = Parallel(n_jobs=-2, verbose=0, max_nbytes=None)(delayed(do_resize_label)(os.path.normpath(lfile), TARGET_SIZE) for lfile in label_files)
else:
## cycle through, merge and padd/resize if need to
for file,lfile in zip(files, label_files):
for f in file:
do_resize_image(f, TARGET_SIZE)
do_resize_label(lfile, TARGET_SIZE)

So, Q for @dbuscombe-usgs :

  • is this something to fix via code (search for each file and make sure it exists? Or just remake resized images?)
  • is this something to fix via documentation (tell people to make sure to delete resized image folders before running make_datasets?)
  • is there another option?

actually i am reading down the code and it may already be implemented in some form?

The skipping of resized folder generation serves the needs of those who use the same dataset to make multiple sets of npzs, for example for creating 2-, 3- and 4-class datasets

Yes you are correct that if the prescribed workflow is not followed, or if something goes wrong, the directories 'resized_images' and 'resized_labels' would need to be deleted (or whatever it takes)

is this something to fix via documentation (tell people to make sure to delete resized image folders before running make_datasets?)

No, ordinarily resized images and folders directories should not be deleted, if created properly

is this something to fix via code (search for each file and make sure it exists? Or just remake resized images?)

Yes, this would be the solution. Program could be updated to

  1. check directories are not empty
  2. check length of files in both directories are the same

anything else to check?

actually i am reading down the code and it may already be implemented in some form?

Specifically, what may already be implemented?

Honestly I dont really know under what cirumstances the folders would be empty. Seems like a non-issue

yep, agreed.. lets close it.. woudl only be empty if make_datasets fails...