train_unet_neurodata.py default seed is a seed of 0, not a random seed

Question

train_unet_neurodata.py default seed is a seed of 0, not a random seed

dmankins opened this issue 5 years ago · 2 comments

train_unet_neurodata.py has a default of 0 for the --seed option --- meaning, use 0 for the random seed. A seed of 0 is not a seed of None --- it's just like using any other fixed integral seed. This means that, unless you specify a seed, all runs of train_unet_neurodata.py will end up using the same sequence of results from the PRNGs.

Since there is no provision for getting different runs out of the program, I assume this is not what is intended.

One might consider using -1 as the default value for the --seed option, and then using None as a seed if args.seed is -1:

index 0b42797..e9630de 100644
--- a/examples/train_unet_neurodata.py
+++ b/examples/train_unet_neurodata.py
@@ -46,7 +46,8 @@ parser.add_argument(
 "onsave": Use regular Python model for training, but trace it on-demand for saving training state;
 "train": Use traced model for training and serialize it on disk"""
 )
-parser.add_argument('--seed', type=int, default=0, help='Base seed for all RNGs.')
+# Use an illegal seed value to indicate "no seed" (0 is a seed of 0, not random at all)
+parser.add_argument('--seed', type=int, default=-1, help='Base seed for all RNGs.')
 parser.add_argument(
     '--deterministic', action='store_true',
     help='Run in fully deterministic mode (at the cost of execution speed).'
@@ -55,9 +56,14 @@ args = parser.parse_args()
 
 # Set up all RNG seeds, set level of determinism
 random_seed = args.seed
-torch.manual_seed(random_seed)
-np.random.seed(random_seed)
-random.seed(random_seed)
+if random_seed < 0:
+    np.random.seed()
+    random.seed()
+else:
+    torch.manual_seed(random_seed)
+    np.random.seed(random_seed)
+    random.seed(random_seed)
+
 deterministic = args.deterministic
 if deterministic:
     torch.backends.cudnn.deterministic = True

Answer 1 · 2019-06-24T15:52:16.000Z

This means that, unless you specify a seed, all runs of train_unet_neurodata.py will end up using the same sequence of results from the PRNGs

This is actually the intended behavior. The reason for this is that a fixed deterministic seed

allows for comparing training runs with different hyperparameter settings on the same training data and
makes trainings reproducible by default (see #28).

Using a randomly chosen seed makes it impossible to cleanly compare training runs because randomness can have a significant influence on trainings (speaking from personal experience).
Using 0 as the default seed was an arbitrary choice.
If you want to deliberately test how randomness influences a training, the idea is to set the --seed option to different values manually in different runs.

Is there an advantage of letting the random seed be nondeterministic by default?

Answer 2 · 2019-06-24T16:11:35.000Z

If that is the intention, then there is no problem.

Perhaps a comment in the help-string for the --seed option ("if you want randomness, choose a different seed for each run") would be appropriate. In my experience, an option for specifying a specific seed is used to allow for reproducibility, and the default is a random seed (which I assume is why random.seed and np.random.seed use None for their defaults).

(Feel free to close this issue.)