Cannot start training with TensorFlow 2.0 and distribute.MirroredStrategy
makercob opened this issue · 6 comments
makercob commented
Describe the Bug
Cannot start training with TensorFlow 2.0 and distribute.MirroredStrategy.
Version Info
TensorFlow 2.0beta1
Python 3.6.8
- [ yes] I'm using the latest version
Minimal Codes To Reproduce
strategy = tf.distribute.MirroredStrategy(devices=FLAGS.compute_devices,
cross_device_ops=tf.distribute.HierarchicalCopyAllReduce())
with strategy.scope():
optimizer = RAdam(learning_rate=1e-3)
model.compile(optimizer=optimizer, ... run_eagerly=False)
model.fit(train_dataset)
CyberZHG commented
Try changing this line
keras-radam/keras_radam/optimizer_v2.py
Line 65 in 7bfd0c0
to
self._set_hyper('total_steps', float(total_steps))
?
CyberZHG commented
I think it's because the optimizer is implemented with pure python (compares to c++ & cuda).
makercob commented
my bad. Issues in my code might be the culprit of performance drop.
makercob commented
@CyberZHG Compared to native SGD optimizer, same training time consumption per epoch is observed. Convergence is however, much much faster. Thanks!
CyberZHG commented
I've made a new release for this issue. You can upgrade to 0.7.0
if you're using pip
.