generated_samples
fxctydfty opened this issue · 10 comments
I am able to run the training algorithm. But when I run the generating_data, it never creates the output in the "generated_samples" folder. I attached the worker log here. Could you please help me on that?
Thanks in advance.
The logs look normal. How long have this been stuck without generated_samples folder?
After it prints out "Finish Building". Nothing happened. I tried several times. Same thing.
I am running Python Version 3.7.10 and Tensorflow 1.14.0.
Could you please share example_generating_data/config_generate_data.py
and example_training/config.py
you are using?
*config_generate_data.py
config = {
"scheduler_config": {
"gpu": ["0"],
"config_string_value_maxlen": 1000,
"result_root_folder": "../results/",
"scheduler_log_file_path": "scheduler_generate_data.log",
"log_file": "worker_generate_data.log",
"force_rerun": True
},
"global_config": {
"batch_size": 100,
"vis_freq": 200,
"vis_num_sample": 5,
"d_rounds": 1,
"g_rounds": 1,
"num_packing": 1,
"noise": True,
"feed_back": False,
"g_lr": 0.001,
"d_lr": 0.001,
"d_gp_coe": 10.0,
"gen_feature_num_layers": 1,
"gen_feature_num_units": 100,
"gen_attribute_num_layers": 3,
"gen_attribute_num_units": 100,
"disc_num_layers": 5,
"disc_num_units": 200,
"initial_state": "random",
"attr_d_lr": 0.001,
"attr_d_gp_coe": 10.0,
"g_attr_d_coe": 1.0,
"attr_disc_num_layers": 5,
"attr_disc_num_units": 200,
"generate_num_train_sample": 50000,
"generate_num_test_sample": 50000
},
"test_config": [
{
"dataset": ["web"],
"epoch": [2],
"run": [0, 1, 2],
"sample_len": [1, 5],
"extra_checkpoint_freq": [5],
"epoch_checkpoint_freq": [1],
"aux_disc": [False],
"self_norm": [False]
}
]
}
**config.py
config = {
"scheduler_config": {
"gpu": ["0","1"],
"config_string_value_maxlen": 1000,
"result_root_folder": "../results/"
},
"global_config": {
"batch_size": 100,
"vis_freq": 200,
"vis_num_sample": 5,
"d_rounds": 1,
"g_rounds": 1,
"num_packing": 1,
"noise": True,
"feed_back": False,
"g_lr": 0.001,
"d_lr": 0.001,
"d_gp_coe": 10.0,
"gen_feature_num_layers": 1,
"gen_feature_num_units": 100,
"gen_attribute_num_layers": 3,
"gen_attribute_num_units": 100,
"disc_num_layers": 5,
"disc_num_units": 200,
"initial_state": "random",
"attr_d_lr": 0.001,
"attr_d_gp_coe": 10.0,
"g_attr_d_coe": 1.0,
"attr_disc_num_layers": 5,
"attr_disc_num_units": 200,
},
"test_config": [
{
"dataset": ["web"],
"epoch": [1],
"run": [0, 1, 2],
"sample_len": [1, 5],
"extra_checkpoint_freq": [5],
"epoch_checkpoint_freq": [1],
"aux_disc": [False],
"self_norm": [False]
}
]
}
I see where the problem comes from. example_generating_data/gan_generate_data_task.py
generates data for the mid-checkpoints. In your config.py, you train the model for only 1 epoch "epoch": [1],
, and the frequency for saving mid-checkpoints is 5 "extra_checkpoint_freq": [5],
, so the code didn't save any mid-checkpoints at all, thus it didn't generate samples.
If you want to generate data from the last checkpoint instead, you can delete these lines
DoppelGANger/example_generating_data/gan_generate_data_task.py
Lines 138 to 151 in e732a4d
mid_checkpoint_dir = checkpoint_dir
, and save_path = checkpoint_dir
I increased the epoch size to 20. Now I have the different error while training. Could you please take a look the log file.
It seems like you are running on a Windows system. Could you change
and to "result_root_folder": "..\\results\\"
and try again?Hey
Its working now. thanks for your help.