ELEKTRONN/elektronn3

train_unet_neurodata.py default seed is a seed of 0, not a random seed

dmankins opened this issue · 2 comments

train_unet_neurodata.py has a default of 0 for the --seed option --- meaning, use 0 for the random seed. A seed of 0 is not a seed of None --- it's just like using any other fixed integral seed. This means that, unless you specify a seed, all runs of train_unet_neurodata.py will end up using the same sequence of results from the PRNGs.

Since there is no provision for getting different runs out of the program, I assume this is not what is intended.

One might consider using -1 as the default value for the --seed option, and then using None as a seed if args.seed is -1:

index 0b42797..e9630de 100644
--- a/examples/train_unet_neurodata.py
+++ b/examples/train_unet_neurodata.py
@@ -46,7 +46,8 @@ parser.add_argument(
 "onsave": Use regular Python model for training, but trace it on-demand for saving training state;
 "train": Use traced model for training and serialize it on disk"""
 )
-parser.add_argument('--seed', type=int, default=0, help='Base seed for all RNGs.')
+# Use an illegal seed value to indicate "no seed" (0 is a seed of 0, not random at all)
+parser.add_argument('--seed', type=int, default=-1, help='Base seed for all RNGs.')
 parser.add_argument(
     '--deterministic', action='store_true',
     help='Run in fully deterministic mode (at the cost of execution speed).'
@@ -55,9 +56,14 @@ args = parser.parse_args()
 
 # Set up all RNG seeds, set level of determinism
 random_seed = args.seed
-torch.manual_seed(random_seed)
-np.random.seed(random_seed)
-random.seed(random_seed)
+if random_seed < 0:
+    np.random.seed()
+    random.seed()
+else:
+    torch.manual_seed(random_seed)
+    np.random.seed(random_seed)
+    random.seed(random_seed)
+
 deterministic = args.deterministic
 if deterministic:
     torch.backends.cudnn.deterministic = True
mdraw commented

This means that, unless you specify a seed, all runs of train_unet_neurodata.py will end up using the same sequence of results from the PRNGs

This is actually the intended behavior. The reason for this is that a fixed deterministic seed

  1. allows for comparing training runs with different hyperparameter settings on the same training data and
  2. makes trainings reproducible by default (see #28).

Using a randomly chosen seed makes it impossible to cleanly compare training runs because randomness can have a significant influence on trainings (speaking from personal experience).
Using 0 as the default seed was an arbitrary choice.
If you want to deliberately test how randomness influences a training, the idea is to set the --seed option to different values manually in different runs.

Is there an advantage of letting the random seed be nondeterministic by default?

If that is the intention, then there is no problem.

Perhaps a comment in the help-string for the --seed option ("if you want randomness, choose a different seed for each run") would be appropriate. In my experience, an option for specifying a specific seed is used to allow for reproducibility, and the default is a random seed (which I assume is why random.seed and np.random.seed use None for their defaults).

(Feel free to close this issue.)