CyberZHG/keras-radam

Cannot start training with TensorFlow 2.0 and distribute.MirroredStrategy

makercob opened this issue · 6 comments

Describe the Bug
Cannot start training with TensorFlow 2.0 and distribute.MirroredStrategy.

Version Info
TensorFlow 2.0beta1
Python 3.6.8

  • [ yes] I'm using the latest version

Minimal Codes To Reproduce

strategy = tf.distribute.MirroredStrategy(devices=FLAGS.compute_devices,
           cross_device_ops=tf.distribute.HierarchicalCopyAllReduce())

with strategy.scope():
    optimizer = RAdam(learning_rate=1e-3)
    model.compile(optimizer=optimizer, ... run_eagerly=False)
    model.fit(train_dataset)

Screen Shot 2019-08-21 at 13 34 47

Try changing this line

self._set_hyper('total_steps', total_steps)

to

self._set_hyper('total_steps', float(total_steps))

?

@CyberZHG Thanks. It works.
But it seems much slower than native optimizers.

I think it's because the optimizer is implemented with pure python (compares to c++ & cuda).

my bad. Issues in my code might be the culprit of performance drop.

@CyberZHG Compared to native SGD optimizer, same training time consumption per epoch is observed. Convergence is however, much much faster. Thanks!

I've made a new release for this issue. You can upgrade to 0.7.0 if you're using pip.