NaN in policy loss
TimZaman opened this issue · 0 comments
TimZaman commented
$ kubectl logs job1-optimizer-master-0
2019-02-03 08:08:55,542 INFO main(rmq_host=job1-rmq.default.svc.cluster.local, rmq_port=5672, epochs=4 seq_per_epoch=32, batch_size=8, seq_len=256 learning_rate=1e-05, pretrained_model=None, mq_prefetch_count=4, entropy_coef=0.0001)
2019-02-03 08:08:55,542 INFO init_distribution
2019-02-03 08:08:55,543 WARNING skipping distribution: world size too small (1)
2019-02-03 08:08:55,555 INFO Checkpointing to: exp3/job1
2019-02-03 08:08:55,820 INFO Found a latest model in pretrained dir: exp3/job1/model_000000018.pt
2019-02-03 08:08:55,821 INFO Downloading: exp3/job1/model_000000018.pt
2019-02-03 08:08:55,967 INFO Connected to RMQ
2019-02-03 08:08:56,118 INFO iteration 19/10000
2019-02-03 08:09:04,984 INFO epoch 1/4
2019-02-03 08:09:09,354 INFO epoch 2/4
2019-02-03 08:09:13,669 INFO epoch 3/4
2019-02-03 08:09:17,948 INFO epoch 4/4
2019-02-03 08:09:22,256 INFO steps_per_s=321.15, avg_weight_age=2.8, reward_per_sec=-0.0045, loss=0.0894, entropy=7.509, advantage=-0.047
2019-02-03 08:09:22,535 INFO iteration 20/10000
2019-02-03 08:09:36,020 INFO epoch 1/4
2019-02-03 08:09:40,070 INFO epoch 2/4
2019-02-03 08:09:44,142 INFO epoch 3/4
2019-02-03 08:09:48,175 INFO epoch 4/4
2019-02-03 08:09:52,276 INFO steps_per_s=273.46, avg_weight_age=2.8, reward_per_sec=-0.0024, loss=0.3443, entropy=7.537, advantage=-0.098
2019-02-03 08:09:52,509 INFO iteration 21/10000
2019-02-03 08:10:05,592 INFO epoch 1/4
2019-02-03 08:10:09,640 INFO epoch 2/4
2019-02-03 08:10:13,671 INFO epoch 3/4
2019-02-03 08:10:17,794 INFO epoch 4/4
2019-02-03 08:10:22,531 INFO steps_per_s=287.60, avg_weight_age=2.8, reward_per_sec=-0.0050, loss=0.3568, entropy=7.509, advantage=0.088
2019-02-03 08:10:22,784 INFO iteration 22/10000
2019-02-03 08:10:33,289 INFO epoch 1/4
2019-02-03 08:10:37,302 INFO epoch 2/4
2019-02-03 08:10:41,340 INFO epoch 3/4
2019-02-03 08:10:45,373 INFO epoch 4/4
2019-02-03 08:10:49,536 INFO steps_per_s=322.51, avg_weight_age=13.8, reward_per_sec=-0.0008, loss=0.2533, entropy=7.513, advantage=0.070
2019-02-03 08:10:49,779 INFO iteration 23/10000
2019-02-03 08:11:01,499 INFO epoch 1/4
2019-02-03 08:11:05,246 INFO epoch 2/4
2019-02-03 08:11:09,120 INFO epoch 3/4
2019-02-03 08:11:12,983 INFO epoch 4/4
2019-02-03 08:11:16,889 INFO steps_per_s=299.38, avg_weight_age=12.0, reward_per_sec=-0.0036, loss=0.3263, entropy=7.541, advantage=0.130
2019-02-03 08:11:17,135 INFO iteration 24/10000
tzaman@Tims-Mac-Pro dotaclient (master) $ kubectl logs job1-optimizer-master-0 -p
2019-02-03 07:58:09,186 INFO main(rmq_host=job1-rmq.default.svc.cluster.local, rmq_port=5672, epochs=4 seq_per_epoch=32, batch_size=8, seq_len=256 learning_rate=1e-05, pretrained_model=None, mq_prefetch_count=4, entropy_coef=0.0001)
2019-02-03 07:58:09,186 INFO init_distribution
2019-02-03 07:58:09,186 WARNING skipping distribution: world size too small (1)
2019-02-03 07:58:09,198 INFO Checkpointing to: exp3/job1
2019-02-03 07:58:09,442 INFO Connected to RMQ
2019-02-03 07:58:09,649 INFO iteration 1/10000
2019-02-03 08:00:13,929 INFO epoch 1/4
2019-02-03 08:00:18,560 INFO epoch 2/4
2019-02-03 08:00:22,967 INFO epoch 3/4
2019-02-03 08:00:27,461 INFO epoch 4/4
2019-02-03 08:00:32,009 INFO steps_per_s=59.25, avg_weight_age=0.0, reward_per_sec=-0.0082, loss=0.3138, entropy=7.515, advantage=0.227
2019-02-03 08:00:32,601 INFO iteration 2/10000
2019-02-03 08:00:34,976 INFO epoch 1/4
2019-02-03 08:00:39,304 INFO epoch 2/4
2019-02-03 08:00:43,549 INFO epoch 3/4
2019-02-03 08:00:47,804 INFO epoch 4/4
2019-02-03 08:00:52,170 INFO steps_per_s=407.36, avg_weight_age=1.0, reward_per_sec=-0.0057, loss=0.1834, entropy=7.549, advantage=0.153
2019-02-03 08:00:52,454 INFO iteration 3/10000
2019-02-03 08:00:54,928 INFO epoch 1/4
2019-02-03 08:00:58,679 INFO epoch 2/4
2019-02-03 08:01:02,412 INFO epoch 3/4
2019-02-03 08:01:06,143 INFO epoch 4/4
2019-02-03 08:01:10,061 INFO steps_per_s=457.79, avg_weight_age=2.0, reward_per_sec=-0.0045, loss=0.1381, entropy=7.549, advantage=0.109
2019-02-03 08:01:10,330 INFO iteration 4/10000
2019-02-03 08:01:36,341 INFO epoch 1/4
2019-02-03 08:01:40,406 INFO epoch 2/4
2019-02-03 08:01:44,472 INFO epoch 3/4
2019-02-03 08:01:48,500 INFO epoch 4/4
2019-02-03 08:01:52,630 INFO steps_per_s=204.48, avg_weight_age=3.0, reward_per_sec=-0.0052, loss=0.2404, entropy=7.518, advantage=0.120
2019-02-03 08:01:52,858 INFO iteration 5/10000
2019-02-03 08:01:55,481 INFO epoch 1/4
2019-02-03 08:01:59,453 INFO epoch 2/4
2019-02-03 08:02:03,407 INFO epoch 3/4
2019-02-03 08:02:07,419 INFO epoch 4/4
2019-02-03 08:02:11,792 INFO steps_per_s=441.39, avg_weight_age=4.0, reward_per_sec=-0.0048, loss=0.2777, entropy=7.482, advantage=0.130
2019-02-03 08:02:12,060 INFO iteration 6/10000
2019-02-03 08:02:15,000 INFO epoch 1/4
2019-02-03 08:02:19,372 INFO epoch 2/4
2019-02-03 08:02:23,516 INFO epoch 3/4
2019-02-03 08:02:27,732 INFO epoch 4/4
2019-02-03 08:02:32,221 INFO steps_per_s=425.83, avg_weight_age=5.0, reward_per_sec=-0.0029, loss=0.2681, entropy=7.498, advantage=-0.018
2019-02-03 08:02:32,489 INFO iteration 7/10000
2019-02-03 08:03:00,428 INFO epoch 1/4
2019-02-03 08:03:04,404 INFO epoch 2/4
2019-02-03 08:03:08,522 INFO epoch 3/4
2019-02-03 08:03:12,645 INFO epoch 4/4
2019-02-03 08:03:16,870 INFO steps_per_s=189.22, avg_weight_age=4.0, reward_per_sec=-0.0069, loss=0.2963, entropy=7.481, advantage=0.130
2019-02-03 08:03:17,154 INFO iteration 8/10000
2019-02-03 08:03:19,849 INFO epoch 1/4
2019-02-03 08:03:23,899 INFO epoch 2/4
2019-02-03 08:03:28,062 INFO epoch 3/4
2019-02-03 08:03:32,274 INFO epoch 4/4
2019-02-03 08:03:36,499 INFO steps_per_s=443.39, avg_weight_age=4.9, reward_per_sec=-0.0046, loss=0.2225, entropy=7.487, advantage=0.121
2019-02-03 08:03:36,767 INFO iteration 9/10000
2019-02-03 08:03:40,356 INFO epoch 1/4
2019-02-03 08:03:44,524 INFO epoch 2/4
2019-02-03 08:03:48,800 INFO epoch 3/4
2019-02-03 08:03:53,079 INFO epoch 4/4
2019-02-03 08:03:57,397 INFO steps_per_s=428.69, avg_weight_age=5.8, reward_per_sec=-0.0031, loss=0.1637, entropy=7.529, advantage=-0.027
2019-02-03 08:03:57,624 INFO iteration 10/10000
2019-02-03 08:04:09,730 INFO epoch 1/4
2019-02-03 08:04:13,922 INFO epoch 2/4
2019-02-03 08:04:18,159 INFO epoch 3/4
2019-02-03 08:04:22,414 INFO epoch 4/4
2019-02-03 08:04:26,668 INFO steps_per_s=297.42, avg_weight_age=4.4, reward_per_sec=-0.0052, loss=0.2684, entropy=7.551, advantage=0.137
2019-02-03 08:04:26,937 INFO iteration 11/10000
2019-02-03 08:04:40,910 INFO epoch 1/4
2019-02-03 08:04:44,869 INFO epoch 2/4
2019-02-03 08:04:48,852 INFO epoch 3/4
2019-02-03 08:04:52,930 INFO epoch 4/4
2019-02-03 08:04:57,053 INFO steps_per_s=277.99, avg_weight_age=5.0, reward_per_sec=-0.0055, loss=0.3511, entropy=7.519, advantage=0.157
2019-02-03 08:04:57,328 INFO iteration 12/10000
2019-02-03 08:05:08,032 INFO epoch 1/4
2019-02-03 08:05:12,169 INFO epoch 2/4
2019-02-03 08:05:16,640 INFO epoch 3/4
2019-02-03 08:05:20,895 INFO epoch 4/4
2019-02-03 08:05:25,137 INFO steps_per_s=309.99, avg_weight_age=5.4, reward_per_sec=-0.0043, loss=0.1677, entropy=7.552, advantage=0.002
2019-02-03 08:05:25,377 INFO iteration 13/10000
2019-02-03 08:05:32,969 INFO epoch 1/4
2019-02-03 08:05:36,927 INFO epoch 2/4
2019-02-03 08:05:40,960 INFO epoch 3/4
2019-02-03 08:05:45,058 INFO epoch 4/4
2019-02-03 08:05:49,178 INFO steps_per_s=351.36, avg_weight_age=4.3, reward_per_sec=-0.0046, loss=0.3550, entropy=7.511, advantage=0.136
2019-02-03 08:05:49,449 INFO iteration 14/10000
2019-02-03 08:06:07,304 INFO epoch 1/4
2019-02-03 08:06:11,466 INFO epoch 2/4
2019-02-03 08:06:15,637 INFO epoch 3/4
2019-02-03 08:06:19,784 INFO epoch 4/4
2019-02-03 08:06:24,006 INFO steps_per_s=242.56, avg_weight_age=4.4, reward_per_sec=-0.0027, loss=0.1183, entropy=7.534, advantage=0.041
2019-02-03 08:06:24,288 INFO iteration 15/10000
2019-02-03 08:06:28,117 INFO epoch 1/4
2019-02-03 08:06:32,300 INFO epoch 2/4
2019-02-03 08:06:36,325 INFO epoch 3/4
2019-02-03 08:06:40,490 INFO epoch 4/4
2019-02-03 08:06:44,682 INFO steps_per_s=408.59, avg_weight_age=4.5, reward_per_sec=-0.0066, loss=0.3987, entropy=7.405, advantage=0.203
2019-02-03 08:06:44,939 INFO iteration 16/10000
2019-02-03 08:06:58,538 INFO epoch 1/4
2019-02-03 08:07:02,430 INFO epoch 2/4
2019-02-03 08:07:06,373 INFO epoch 3/4
2019-02-03 08:07:10,318 INFO epoch 4/4
2019-02-03 08:07:14,332 INFO steps_per_s=276.31, avg_weight_age=4.3, reward_per_sec=-0.0044, loss=0.3123, entropy=7.525, advantage=0.166
2019-02-03 08:07:14,598 INFO iteration 17/10000
2019-02-03 08:07:32,498 INFO epoch 1/4
2019-02-03 08:07:36,640 INFO epoch 2/4
2019-02-03 08:07:40,742 INFO epoch 3/4
2019-02-03 08:07:44,962 INFO epoch 4/4
2019-02-03 08:07:49,240 INFO steps_per_s=249.25, avg_weight_age=4.9, reward_per_sec=-0.0031, loss=0.1578, entropy=7.548, advantage=-0.241
2019-02-03 08:07:49,485 INFO iteration 18/10000
2019-02-03 08:07:52,115 INFO epoch 1/4
2019-02-03 08:07:55,908 INFO epoch 2/4
2019-02-03 08:07:59,834 INFO epoch 3/4
2019-02-03 08:08:03,765 INFO epoch 4/4
2019-02-03 08:08:07,733 INFO steps_per_s=443.28, avg_weight_age=4.6, reward_per_sec=-0.0054, loss=0.3175, entropy=7.556, advantage=0.137
2019-02-03 08:08:07,999 INFO iteration 19/10000
2019-02-03 08:08:15,437 INFO epoch 1/4
2019-02-03 08:08:19,210 INFO epoch 2/4
2019-02-03 08:08:23,094 INFO epoch 3/4
2019-02-03 08:08:27,007 INFO epoch 4/4
2019-02-03 08:08:30,989 INFO steps_per_s=352.08, avg_weight_age=4.8, reward_per_sec=-0.0024, loss=0.2117, entropy=7.556, advantage=-0.033
2019-02-03 08:08:31,275 INFO iteration 20/10000
2019-02-03 08:08:45,272 INFO epoch 1/4
2019-02-03 08:08:49,297 INFO epoch 2/4
Traceback (most recent call last):
File "optimizer.py", line 746, in <module>
run_local=args.run_local,
File "optimizer.py", line 702, in main
dota_optimizer.run()
File "optimizer.py", line 441, in run
loss, entropy_d, advantage = self.train(experiences=batch)
File "optimizer.py", line 619, in train
loss, policy_loss, entropy_loss, advantage_loss))
ValueError: loss=nan, policy_loss=nan, entropy_loss=0.00010979062062688172, advantage_loss=0.003007180755957961