Unable to use run_training.py with custom dataset
Opened this issue · 10 comments
Trying to use run_training.py
as so:
!python run_training.py --num-gpus=1 --data-dir=dataset --config=config-f --dataset=blows --mirror-augment=false --metric=none --total-kimg=20000 --result-dir="/content/drive/My Drive/stylegan2/results"
gives me the following error:
Local submit - run_dir: /content/drive/My Drive/stylegan2/results/00026-stylegan2-blows-1gpu-config-f
dnnlib: Running training.training_loop.training_loop() on localhost...
Streaming data using training.dataset.TFRecordDataset...
Traceback (most recent call last):
File "run_training.py", line 209, in <module>
main()
File "run_training.py", line 204, in main
run(**vars(args))
File "run_training.py", line 129, in run
dnnlib.submit_run(**kwargs)
File "/content/stylegan2/dnnlib/submission/submit.py", line 343, in submit_run
return farm.submit(submit_config, host_run_dir)
File "/content/stylegan2/dnnlib/submission/internal/local.py", line 22, in submit
return run_wrapper(submit_config)
File "/content/stylegan2/dnnlib/submission/submit.py", line 280, in run_wrapper
run_func_obj(**submit_config.run_func_kwargs)
File "/content/stylegan2/training/training_loop.py", line 156, in training_loop
training_set = dataset.load_dataset(data_dir=dnnlib.convert_path(data_dir), verbose=True, **dataset_args)
File "/content/stylegan2/training/dataset.py", line 239, in load_dataset
dataset = dnnlib.util.get_obj_by_name(class_name)(**adjusted_kwargs)
File "/content/stylegan2/training/dataset.py", line 167, in __init__
dset = dset.map(parse_tfrecord_tf_raw, num_parallel_calls=num_threads)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 1913, in map
self, map_func, num_parallel_calls, preserve_cardinality=False))
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 3472, in __init__
use_legacy_function=use_legacy_function)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2713, in __init__
self._function = wrapper_fn._get_concrete_function_internal()
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1853, in _get_concrete_function_internal
*args, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 1847, in _get_concrete_function_internal_garbage_collected
graph_function, _, _ = self._maybe_define_function(args, kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 2147, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/eager/function.py", line 2038, in _create_graph_function
capture_by_value=self._capture_by_value),
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/func_graph.py", line 915, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2707, in wrapper_fn
ret = _wrapper_helper(*args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/data/ops/dataset_ops.py", line 2652, in _wrapper_helper
ret = autograph.tf_convert(func, ag_ctx)(*nested_args)
File "/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/autograph/impl/api.py", line 237, in wrapper
raise e.ag_error_metadata.to_exception(e)
TypeError: in converted code:
/content/stylegan2/training/dataset.py:27 parse_tfrecord_tf_raw *
features = tf.parse_single_example(
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/parsing_ops.py:1019 parse_single_example
serialized, features, example_names, name
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/parsing_ops.py:1063 parse_single_example_v2_unoptimized
return parse_single_example_v2(serialized, features, name)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/parsing_ops.py:2093 parse_single_example_v2
dense_defaults, dense_shapes, name)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/parsing_ops.py:2210 _parse_single_example_v2_raw
name=name)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/gen_parsing_ops.py:1201 parse_single_example
dense_shapes=dense_shapes, name=name)
/usr/local/lib/python3.6/dist-packages/tensorflow_core/python/framework/op_def_library.py:551 _apply_op_helper
(prefix, dtypes.as_dtype(input_arg.type).name))
TypeError: Input 'serialized' of 'ParseSingleExample' Op has type uint8 that does not match expected type of string.
yet running the following returns no errors:
features = tf.parse_single_example(
'/content/stylegan2/dataset/blows/blows-r07.tfrecords',
features={
"shape": tf.FixedLenFeature([3], tf.int64),
"img": tf.FixedLenFeature([], tf.string),
},
)
What is your custom data format? jpeg images or array files?
@skyflynil (1024, 1024, 3) JPEG images in a folder
I assume you created the tfrecord using:
!python dataset_tool.py create_from_images_raw --res_log2=8 ./dataset/dataset_name untared_raw_image_dir
and added --min-h=4 --min-w=4 --res-log2=8 parameters for run_training.py?
I didn't add the extra arguments. I left them as defaults:
!python dataset_tool.py create_from_images_raw dataset/blows blows
since I also used the defaults for !python run_training.py
. Should I add them and try again?
The default for run_training.py is to train 512*512 (min-h=4, min-w=4, res-log2=7) images, but the error does not seems to be related though. Usually there will be shape not equal issue when feeding the image during actual training. The error you got seems to complain the tfrecord itself which I don't quite understand. And you are using tensorflow 1.15.0 right?
Yes 😔 I've also noticed another user with the issue in a fork of your repo: pbaylies#2 (comment)
Yes 😔 I've also noticed another user with the issue in a fork of your repo: pbaylies#2 (comment)
It seems to me the issue there was:
The record is created using "create_from_images_images" , and training is by default using decoded format ( probably pbaylies changed the default behavior)
The fixed mentioned there is basically matching the reading part to the creation stage.
There's the second part of that issue mentioned by user @pender which seems to have the exact same stack trace as I do.
Also, what is the difference between create_from_images_raw
and create_from_images
? If create_from_images_raw
also expects a directory of images, what are the benefits of using create_from_images_raw
? I get that it reads the images as bytes, but should I use create_from_images
to see if that helps with the problem?
create_from_images_raw directly puts jpeg/png images into tfrecord without decoding while create_from_images first decode the image into numpy arrays then put into the tfrecord.
The tradeoff is create_from_images_raw reduce the record size while during training have to pay the penalty of decoding the images again and again. For my repo, the default behavior is training using tfrecord from create_from_images_raw, while for pbaylies's, assuming create_from_images
I've discovered that the error only happens when using create_from_images_raw
and not create_from_images
. Will investigate further after my exams on Monday.
Is there also a way of changing the output of styleGAN? I'm currently getting the following error when running run_training.y
ValueError: Dimension 2 in both shapes must be equal, but are 1024 and 64. Shapes are [?,3,1024,1024] and [?,3,64,64].
I know you said default image size is 512, even after resizing all my images to 512*512, I still get the error with unequal dimensions of 512 and 64. What am I doing wrong? Is there any way of changing the GAN input/output to 1024?