ValueError: Variable DoppelGANgerGenerator/attribute_real/layer0/linear/matrix/Adam/ already exists, disallowed.
lurw2000 opened this issue · 5 comments
I have just follow the instructions and run the script driver.py
. Here is the error message:
Traceback (most recent call last):
File "/home/runwei/NetShare/netshare/models/model.py", line 27, in train
log_folder=log_folder)
File "/home/runwei/NetShare/netshare/models/doppelganger_tf_model.py", line 176, in _train
gan.build()
File "/home/runwei/NetShare/netshare/models/doppelganger_tf/doppelganger.py", line 293, in build
self.build_loss()
File "/home/runwei/NetShare/netshare/models/doppelganger_tf/doppelganger.py", line 708, in build_loss
self.g_loss, var_list=self.generator.trainable_vars
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/training/optimizer.py", line 413, in minimize
name=name)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/training/optimizer.py", line 597, in apply_gradients
self._create_slots(var_list)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/training/adam.py", line 131, in _create_slots
self._zeros_slot(v, "m", self._name)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/training/optimizer.py", line 1156, in _zeros_slot
new_slot_variable = slot_creator.create_zeros_slot(var, op_name)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/training/slot_creator.py", line 190, in create_zeros_slot
colocate_with_primary=colocate_with_primary)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/training/slot_creator.py", line 164, in create_slot_with_initializer
dtype)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/training/slot_creator.py", line 74, in _create_slot_var
validate_shape=validate_shape)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/ops/variable_scope.py", line 1500, in get_variable
aggregation=aggregation)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/ops/variable_scope.py", line 1243, in get_variable
aggregation=aggregation)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/ops/variable_scope.py", line 567, in get_variable
aggregation=aggregation)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/ops/variable_scope.py", line 519, in _true_getter
aggregation=aggregation)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/ops/variable_scope.py", line 868, in _get_single_variable
(err_msg, "".join(traceback.format_list(tb))))
ValueError: Variable DoppelGANgerGenerator/attribute_real/layer0/linear/matrix/Adam/ already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 1748, in __init__
self._traceback = tf_stack.extract_stack()
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3426, in _create_op_internal
op_def=op_def)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/framework/ops.py", line 3357, in create_op
attrs, op_def, compute_device)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/util/deprecation.py", line 507, in new_func
return func(*args, **kwargs)
File "/home/runwei/anaconda3/envs/netshare/lib/python3.6/site-packages/tensorflow_core/python/framework/op_def_library.py", line 794, in _apply_op_helper
op_def=op_def)
Hi, thank you for your interest in NetShare. To reproduce the error on our end, could you please let us know:
- Which dataset are you running? We have not fully tested the framework in the main branch but we never saw this error before.
- Are you running NetShare on a cluster or on a single machine?
- Is Ray package installed and turned on (ray.config.enabled=True)?
I'm running on a single machine(Ubuntu 22.04.1) and turn Ray off.
The driver.py
looks:
import netshare.ray as ray
from netshare import Generator
if __name__ == '__main__':
ray.config.enabled = False
generator = Generator(config="netflow/config_example_netflow_nodp.json")
generator.train_and_generate(work_folder='../results/netflow/test')
Thanks. We will look into it and get back to you.
Sorry for the delay. It took us some time to pinpoint the issue as we mainly use Ray=ON and a cluster for dev/test.
The problem is that when Ray is OFF and using a single machine, everything will be running sequentially such that there are multiple TF instances in the same process, which will cause the "graph exists" error.
The solution is to add the following code snippet to train/generate function to reset the TF graph every time it starts:
NetShare/netshare/models/doppelganger_tf_model.py
Lines 25 to 27 in e737d73
NetShare/netshare/models/doppelganger_tf_model.py
Lines 191 to 193 in e737d73
We have updated the scripts and README. Please pull the latest codebase and check the README and let us know if you encounter any problems further.
Side note: running on a single machine with Ray=OFF will take infinitely long for the code to finish. We would recommend using a cluster if possible. Alternatively, for quick validation purposes regardless of fidelity, you may follow Tip 1 of Example Usage to set a very small training iteration number to get a sense of running NetShare end-to-end.
Just close this issue since there is not further update on that. Feel free to create a new one or reopen if has any other questions.