perf on FetchPush-v1
huangjiancong1 opened this issue · 16 comments
It seems that the last performance hard to increased when using FetchPush-v1
, see the bottom of the output
main algo : PeNFAC(lambda)-V
episode 0 total steps 0 last perf 0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 0 50 -50.000000 -39.4993932862 -50.0000000000 0.00000 0.20000 0 0.000 49.710 82.538
episode 100 total steps 5000 last perf 0.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 100 50 -50.000 -39.4993932862 -50.0000000000 305.25273 0.20000 250 0.416 742.022 1764.851
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 100 50 -50.000 -39.4993932862 -50.0000000000 305.25273 0.20000 250 0.416 742.022 1764.851
episode 200 total steps 10000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 200 50 -50.000 -39.4993932862 -50.0000000000 439.32512 0.20000 250 0.300 874.353 3509.462
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 200 50 -50.000 -39.4993932862 -50.0000000000 439.32512 0.20000 250 0.300 874.353 3509.462
episode 300 total steps 15000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 300 50 -50.000 -39.4993932862 -50.0000000000 405.47166 0.20000 250 0.296 991.178 3248.775
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 300 50 -50.000 -39.4993932862 -50.0000000000 405.47166 0.20000 250 0.296 991.178 3248.775
episode 400 total steps 20000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 400 50 -50.000 -39.4993932862 -50.0000000000 288.83902 0.20000 250 0.408 1073.500 2395.008
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 400 50 -50.000 -39.4993932862 -50.0000000000 288.83902 0.20000 250 0.408 1073.500 2395.008
episode 500 total steps 25000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 500 50 -50.000 -39.4993932862 -50.0000000000 434.30841 0.20000 250 0.300 1221.075 2090.089
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 500 50 -50.000 -39.4993932862 -50.0000000000 434.30841 0.20000 250 0.300 1221.075 2090.089
episode 600 total steps 30000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 600 50 -50.000 -39.4993932862 -50.0000000000 425.04641 0.20000 250 0.300 1245.507 3301.051
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 600 50 -50.000 -39.4993932862 -50.0000000000 425.04641 0.20000 250 0.300 1245.507 3301.051
episode 700 total steps 35000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 700 50 -50.000 -39.4993932862 -50.0000000000 395.15348 0.20000 250 0.288 1453.767 2843.399
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 700 50 -50.000 -39.4993932862 -50.0000000000 395.15348 0.20000 250 0.288 1453.767 2843.399
episode 800 total steps 40000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 800 50 -7.000 -6.7934652093 -7.0000000000 343.70716 0.20000 250 0.340 1426.707 2757.104
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 800 50 -50.000 -39.4993932862 -50.0000000000 343.70716 0.20000 250 0.340 1426.707 2757.104
episode 900 total steps 45000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 900 50 -50.000 -39.4993932862 -50.0000000000 313.93685 0.20000 250 0.344 1444.552 3218.356
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 900 50 -50.000 -39.4993932862 -50.0000000000 313.93685 0.20000 250 0.344 1444.552 3218.356
episode 1000 total steps 50000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 1000 50 -50.000 -39.4993932862 -50.0000000000 300.85292 0.20000 250 0.424 1414.471 4332.106
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 1000 50 -50.000 -39.4993932862 -50.0000000000 300.85292 0.20000 250 0.424 1414.471 4332.106
episode 1100 total steps 55000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 1100 50 -50.000 -39.4993932862 -50.0000000000 301.20900 0.20000 250 0.368 1413.057 3611.521
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 1100 50 -50.000 -39.4993932862 -50.0000000000 301.20900 0.20000 250 0.368 1413.057 3611.521
episode 1200 total steps 60000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 1200 50 -50.000 -39.4993932862 -50.0000000000 302.35951 0.20000 250 0.380 1421.981 3232.280
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 1200 50 -50.000 -39.4993932862 -50.0000000000 302.35951 0.20000 250 0.380 1421.981 3232.280
episode 1300 total steps 65000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 1300 50 -45.000 -34.5983982762 -45.0000000000 313.15894 0.20000 250 0.264 1431.092 3603.456
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 1300 50 -50.000 -39.4993932862 -50.0000000000 313.15894 0.20000 250 0.264 1431.092 3603.456
episode 1400 total steps 70000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 1400 50 -50.000 -39.4993932862 -50.0000000000 293.27581 0.20000 250 0.396 1432.452 2308.182
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 1400 50 -50.000 -39.4993932862 -50.0000000000 293.27581 0.20000 250 0.396 1432.452 2308.182
episode 1500 total steps 75000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 1500 50 -50.000 -39.4993932862 -50.0000000000 299.37081 0.20000 250 0.420 1440.287 2513.630
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 1500 50 -50.000 -39.4993932862 -50.0000000000 299.37081 0.20000 250 0.420 1440.287 2513.630
episode 1600 total steps 80000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 1600 50 0.000 0.0000000000 0.0000000000 188.98480 0.20000 250 0.364 1408.132 2232.789
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 1600 50 -50.000 -39.4993932862 -50.0000000000 188.98480 0.20000 250 0.364 1408.132 2232.789
episode 1700 total steps 85000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 1700 50 -50.000 -39.4993932862 -50.0000000000 288.04573 0.20000 250 0.436 1444.980 2849.277
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 1700 50 -50.000 -39.4993932862 -50.0000000000 288.04573 0.20000 250 0.436 1444.980 2849.277
episode 1800 total steps 90000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 1800 50 -50.000 -39.4993932862 -50.0000000000 308.32325 0.20000 250 0.392 1501.293 3345.464
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 1800 50 -50.000 -39.4993932862 -50.0000000000 308.32325 0.20000 250 0.392 1501.293 3345.464
episode 1900 total steps 95000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 1900 50 -50.000 -39.4993932862 -50.0000000000 331.21574 0.20000 250 0.312 1524.992 2625.236
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 1900 50 -50.000 -39.4993932862 -50.0000000000 331.21574 0.20000 250 0.312 1524.992 2625.236
episode 2000 total steps 100000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 2000 50 -50.000 -39.4993932862 -50.0000000000 342.53142 0.20000 250 0.304 1562.716 3283.008
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 2000 50 -50.000 -39.4993932862 -50.0000000000 342.53142 0.20000 250 0.304 1562.716 3283.008
episode 2100 total steps 105000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 2100 50 -50.000 -39.4993932862 -50.0000000000 314.89381 0.20000 250 0.348 1662.225 2576.668
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 2100 50 -50.000 -39.4993932862 -50.0000000000 314.89381 0.20000 250 0.348 1662.225 2576.668
episode 2200 total steps 110000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 2200 50 -50.000 -39.4993932862 -50.0000000000 325.43476 0.20000 250 0.368 1689.377 2315.436
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 2200 50 -50.000 -39.4993932862 -50.0000000000 325.43476 0.20000 250 0.368 1689.377 2315.436
episode 2300 total steps 115000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 2300 50 -50.000 -39.4993932862 -50.0000000000 323.92725 0.20000 250 0.336 1712.382 2859.400
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 2300 50 -50.000 -39.4993932862 -50.0000000000 323.92725 0.20000 250 0.336 1712.382 2859.400
episode 2400 total steps 120000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 2400 50 -50.000 -39.4993932862 -50.0000000000 302.64502 0.20000 250 0.380 1797.492 2897.479
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 2400 50 -50.000 -39.4993932862 -50.0000000000 302.64502 0.20000 250 0.380 1797.492 2897.479
episode 2500 total steps 125000 last perf 0.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 2500 50 -50.000 -39.4993932862 -50.0000000000 74.62224 0.20000 250 0.404 1732.337 3855.214
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 2500 50 -50.000 -39.4993932862 -50.0000000000 74.62224 0.20000 250 0.404 1732.337 3855.214
episode 2600 total steps 130000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 2600 50 -50.000 -39.4993932862 -50.0000000000 302.42988 0.20000 250 0.384 1821.087 3403.937
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 2600 50 -50.000 -39.4993932862 -50.0000000000 302.42988 0.20000 250 0.384 1821.087 3403.937
episode 2700 total steps 135000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 2700 50 -50.000 -39.4993932862 -50.0000000000 309.27921 0.20000 250 0.352 1825.681 2691.617
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 2700 50 -50.000 -39.4993932862 -50.0000000000 309.27921 0.20000 250 0.352 1825.681 2691.617
episode 2800 total steps 140000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 2800 50 -50.000 -39.4993932862 -50.0000000000 303.45071 0.20000 250 0.452 1823.744 2267.838
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 2800 50 -50.000 -39.4993932862 -50.0000000000 303.45071 0.20000 250 0.452 1823.744 2267.838
episode 2900 total steps 145000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 2900 50 -46.000 -35.5589942862 -46.0000000000 300.21120 0.20000 250 0.380 1832.871 2108.694
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 2900 50 -50.000 -39.4993932862 -50.0000000000 300.21120 0.20000 250 0.380 1832.871 2108.694
episode 3000 total steps 150000 last perf 0.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 3000 50 -50.000 -39.4993932862 -50.0000000000 296.41106 0.20000 250 0.396 1840.616 2203.382
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 3000 50 -50.000 -39.4993932862 -50.0000000000 296.41106 0.20000 250 0.396 1840.616 2203.382
episode 3100 total steps 155000 last perf -47.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 3100 50 -50.000 -39.4993932862 -50.0000000000 304.23030 0.20000 250 0.392 1815.067 2564.230
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 3100 50 -50.000 -39.4993932862 -50.0000000000 304.23030 0.20000 250 0.392 1815.067 2564.230
episode 3200 total steps 160000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 3200 50 -50.000 -39.4993932862 -50.0000000000 314.90150 0.20000 250 0.352 1807.986 2344.424
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 3200 50 -50.000 -39.4993932862 -50.0000000000 314.90150 0.20000 250 0.352 1807.986 2344.424
episode 3300 total steps 165000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 3300 50 -50.000 -39.4993932862 -50.0000000000 294.56957 0.20000 250 0.408 1827.943 2092.198
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 3300 50 -50.000 -39.4993932862 -50.0000000000 294.56957 0.20000 250 0.408 1827.943 2092.198
episode 3400 total steps 170000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 3400 50 -50.000 -39.4993932862 -50.0000000000 319.27477 0.20000 250 0.312 1847.207 2624.171
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 3400 50 -50.000 -39.4993932862 -50.0000000000 319.27477 0.20000 250 0.312 1847.207 2624.171
episode 3500 total steps 175000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 3500 50 -50.000 -39.4993932862 -50.0000000000 302.88077 0.20000 250 0.396 1832.862 2764.258
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 3500 50 -50.000 -39.4993932862 -50.0000000000 302.88077 0.20000 250 0.396 1832.862 2764.258
episode 3600 total steps 180000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 3600 50 -50.000 -39.4993932862 -50.0000000000 189.69495 0.20000 250 0.328 1798.681 2894.372
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 3600 50 -50.000 -39.4993932862 -50.0000000000 189.69495 0.20000 250 0.328 1798.681 2894.372
episode 3700 total steps 185000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 3700 50 -50.000 -39.4993932862 -50.0000000000 174.25073 0.20000 250 0.388 1799.426 3356.493
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 3700 50 -50.000 -39.4993932862 -50.0000000000 174.25073 0.20000 250 0.388 1799.426 3356.493
episode 3800 total steps 190000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 3800 50 -50.000 -39.4993932862 -50.0000000000 318.32951 0.20000 250 0.356 1815.443 3215.285
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 3800 50 -50.000 -39.4993932862 -50.0000000000 318.32951 0.20000 250 0.356 1815.443 3215.285
episode 3900 total steps 195000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 3900 50 -50.000 -39.4993932862 -50.0000000000 190.00381 0.20000 250 0.372 1768.111 3086.686
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 3900 50 -50.000 -39.4993932862 -50.0000000000 190.00381 0.20000 250 0.372 1768.111 3086.686
episode 4000 total steps 200000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 4000 50 -50.000 -39.4993932862 -50.0000000000 305.49717 0.20000 250 0.320 1800.324 2677.632
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 4000 50 -50.000 -39.4993932862 -50.0000000000 305.49717 0.20000 250 0.320 1800.324 2677.632
episode 4100 total steps 205000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 4100 50 -50.000 -39.4993932862 -50.0000000000 186.22632 0.20000 250 0.296 1756.212 2525.554
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 4100 50 -50.000 -39.4993932862 -50.0000000000 186.22632 0.20000 250 0.296 1756.212 2525.554
episode 4200 total steps 210000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 4200 50 -50.000 -39.4993932862 -50.0000000000 186.62446 0.20000 250 0.312 1745.142 2559.601
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 4200 50 -50.000 -39.4993932862 -50.0000000000 186.62446 0.20000 250 0.312 1745.142 2559.601
episode 4300 total steps 215000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 4300 50 -50.000 -39.4993932862 -50.0000000000 92.11743 0.20000 250 0.424 1706.861 4155.797
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 4300 50 -50.000 -39.4993932862 -50.0000000000 92.11743 0.20000 250 0.424 1706.861 4155.797
episode 4400 total steps 220000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 4400 50 -50.000 -39.4993932862 -50.0000000000 305.29147 0.20000 250 0.372 1807.624 3354.478
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 4400 50 -50.000 -39.4993932862 -50.0000000000 305.29147 0.20000 250 0.372 1807.624 3354.478
episode 4500 total steps 225000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 4500 50 -50.000 -39.4993932862 -50.0000000000 309.62576 0.20000 250 0.400 1839.398 2642.012
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 4500 50 -50.000 -39.4993932862 -50.0000000000 309.62576 0.20000 250 0.400 1839.398 2642.012
episode 4600 total steps 230000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 4600 50 -50.000 -39.4993932862 -50.0000000000 297.69136 0.20000 250 0.460 1855.533 3569.598
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 4600 50 -50.000 -39.4993932862 -50.0000000000 297.69136 0.20000 250 0.460 1855.533 3569.598
episode 4700 total steps 235000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 4700 50 0.000 0.0000000000 0.0000000000 303.72088 0.20000 250 0.392 1881.793 2197.497
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 4700 50 -50.000 -39.4993932862 -50.0000000000 303.72088 0.20000 250 0.392 1881.793 2197.497
episode 4800 total steps 240000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 4800 50 -50.000 -39.4993932862 -50.0000000000 182.33329 0.20000 250 0.344 1847.079 3410.346
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 4800 50 -50.000 -39.4993932862 -50.0000000000 182.33329 0.20000 250 0.344 1847.079 3410.346
episode 4900 total steps 245000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 4900 50 -50.000 -39.4993932862 -50.0000000000 297.86280 0.20000 250 0.456 1911.227 3078.142
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 4900 50 -50.000 -39.4993932862 -50.0000000000 297.86280 0.20000 250 0.456 1911.227 3078.142
episode 5000 total steps 250000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 5000 50 -50.000 -39.4993932862 -50.0000000000 306.03380 0.20000 250 0.420 1913.015 2469.496
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 5000 50 -50.000 -39.4993932862 -50.0000000000 306.03380 0.20000 250 0.420 1913.015 2469.496
episode 5100 total steps 255000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 5100 50 -50.000 -39.4993932862 -50.0000000000 306.94059 0.20000 250 0.468 1928.741 2002.506
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 5100 50 -50.000 -39.4993932862 -50.0000000000 306.94059 0.20000 250 0.468 1928.741 2002.506
episode 5200 total steps 260000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 5200 50 -50.000 -39.4993932862 -50.0000000000 186.94835 0.20000 250 0.324 1915.164 2217.157
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 5200 50 -50.000 -39.4993932862 -50.0000000000 186.94835 0.20000 250 0.324 1915.164 2217.157
episode 5300 total steps 265000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 5300 50 -50.000 -39.4993932862 -50.0000000000 308.41859 0.20000 250 0.356 1961.006 2286.489
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 5300 50 -50.000 -39.4993932862 -50.0000000000 308.41859 0.20000 250 0.356 1961.006 2286.489
episode 5400 total steps 270000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 5400 50 -50.000 -39.4993932862 -50.0000000000 321.18466 0.20000 250 0.356 1953.632 2244.455
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 5400 50 -50.000 -39.4993932862 -50.0000000000 321.18466 0.20000 250 0.356 1953.632 2244.455
episode 5500 total steps 275000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 5500 50 -50.000 -39.4993932862 -50.0000000000 321.13202 0.20000 250 0.344 1972.188 2446.748
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 5500 50 -50.000 -39.4993932862 -50.0000000000 321.13202 0.20000 250 0.344 1972.188 2446.748
episode 5600 total steps 280000 last perf -48.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 5600 50 -50.000 -39.4993932862 -50.0000000000 195.19788 0.20000 250 0.324 1914.795 2102.640
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 5600 50 -50.000 -39.4993932862 -50.0000000000 195.19788 0.20000 250 0.324 1914.795 2102.640
episode 5700 total steps 285000 last perf -47
episode 5800 total steps 290000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 5800 50 -50.000 -39.4993932862 -50.0000000000 306.55587 0.20000 250 0.496 1943.434 2069.213
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 5800 50 -50.000 -39.4993932862 -50.0000000000 306.55587 0.20000 250 0.496 1943.434 2069.213
episode 5900 total steps 295000 last perf 0.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 5900 50 -50.000 -39.4993932862 -50.0000000000 202.24115 0.20000 250 0.324 1889.927 2214.111
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 5900 50 -50.000 -39.4993932862 -50.0000000000 202.24115 0.20000 250 0.324 1889.927 2214.111
episode 6000 total steps 300000 last perf -50.0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 6000 50 -50.000 -39.4993932862 -50.0000000000 304.77453 0.20000 250 0.412 1937.542 1790.405
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 6000 50 -50.000 -39.4993932862 -50.0000000000 304.77453 0.20000 250 0.412 1937.542 1790.405
Perf in OpenAI-Gym/HalfCheetah-v2
:
/home/jim/anaconda2/envs/clustering/lib/python3.5/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.24.2) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
{'config': 'config.ini', 'view': False, 'save_best': False, 'load': None, 'render': False, 'capture': False, 'test_only': False}
ENV: <TimeLimit<HalfCheetahEnv<HalfCheetah-v2>>>
State space: Box(17,)
- low: [-inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf -inf
-inf -inf -inf]
- high: [inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf inf]
Action space: Box(6,)
- low: [-1. -1. -1. -1. -1. -1.]
- high: [1. 1. 1. 1. 1. 1.]
Create agent with (nb_motors, nb_sensors) : 6 17
main algo : PeNFAC(lambda)-V
episode 0 total steps 0 last perf 0
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 0 1000 -47.115292 -1.8866550501 -47.1152922702 0.00000 0.20000 0 0.000 43.286 63.462
episode 100 total steps 100000 last perf 0.22751676727166603
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 100 1000 -35.538 -4.6774409211 -35.5381946659 4.48089 0.20000 5000 0.491 405.149 83.786
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 100 1000 -0.049 0.0439331957 -0.0487643295 4.48089 0.20000 5000 0.491 405.149 83.786
episode 200 total steps 200000 last perf 140.46418944639632
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 200 1000 204.978 56.9361475313 204.9779973113 74.57837 0.20000 5000 0.488 654.829 149.948
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 200 1000 48.583 3.4378766450 48.5834364685 74.57837 0.20000 5000 0.488 654.829 149.948
episode 300 total steps 300000 last perf 941.4459374037215
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 300 1000 319.603 86.4225904655 319.6026779233 307.82436 0.20000 5000 0.491 835.650 181.359
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 300 1000 649.431 123.0760117041 649.4310041797 307.82436 0.20000 5000 0.491 835.650 181.359
episode 400 total steps 400000 last perf 1998.9159226163288
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 400 1000 1541.706 137.0518755685 1541.7061888397 2129.72309 0.20000 5000 0.537 1031.988 213.218
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 400 1000 2010.571 136.1758744628 2010.5709348364 2129.72309 0.20000 5000 0.537 1031.988 213.218
episode 500 total steps 500000 last perf 678.2993231530708
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 500 1000 191.508 99.8058855712 191.5079652816 5862.61120 0.20000 5000 0.568 1227.944 245.120
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 500 1000 2799.371 232.6936888905 2799.3706928199 5862.61120 0.20000 5000 0.568 1227.944 245.120
episode 600 total steps 600000 last perf 3279.0396426752723
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : L 600 1000 2524.436 202.3936700988 2524.4356832294 9601.44991 0.20000 5000 0.552 1287.929 266.153
#INFO :/home/jim/ddrl/agent/cacla/src/pybinding/nfac.cpp.85 : T 600 1000 3428.879 278.4329341160 3428.8785063861 9601.44991 0.20000 5000 0.552 1287.929 266.153
For just ddpg for FetchcReach, his reward also always -50 and the performance in mujoco also not reach the desired_goal.
command:
python -m baselines.run --alg=ddpg --env=FetchReach-v1 --num_timesteps=5000 --play
(clustering) jim@jim-Inspiron-7577:~/baselines $ python -m baselines.run --alg=ddpg --env=FetchReach-v1 --num_timesteps=5000 --play
/home/jim/anaconda2/envs/clustering/lib/python3.5/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.24.2) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
Logging to /tmp/openai-2019-06-24-09-15-06-825990
env_type: robotics
2019-06-24 09:15:14.135417: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-06-24 09:15:14.388060: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:895] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-06-24 09:15:14.388312: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1105] Found device 0 with properties:
name: GeForce GTX 1050 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.62
pciBusID: 0000:01:00.0
totalMemory: 3.95GiB freeMemory: 3.44GiB
2019-06-24 09:15:14.388329: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1195] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1050 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
Training ddpg on robotics:FetchReach-v1 with arguments
{'network': 'mlp'}
scaling actions by [1. 1. 1. 1.] before executing in env
setting up param noise
param_noise_actor/mlp_fc0/w:0 <- actor/mlp_fc0/w:0 + noise
param_noise_actor/mlp_fc0/b:0 <- actor/mlp_fc0/b:0 + noise
param_noise_actor/mlp_fc1/w:0 <- actor/mlp_fc1/w:0 + noise
param_noise_actor/mlp_fc1/b:0 <- actor/mlp_fc1/b:0 + noise
param_noise_actor/dense/kernel:0 <- actor/dense/kernel:0 + noise
param_noise_actor/dense/bias:0 <- actor/dense/bias:0 + noise
adaptive_param_noise_actor/mlp_fc0/w:0 <- actor/mlp_fc0/w:0 + noise
adaptive_param_noise_actor/mlp_fc0/b:0 <- actor/mlp_fc0/b:0 + noise
adaptive_param_noise_actor/mlp_fc1/w:0 <- actor/mlp_fc1/w:0 + noise
adaptive_param_noise_actor/mlp_fc1/b:0 <- actor/mlp_fc1/b:0 + noise
adaptive_param_noise_actor/dense/kernel:0 <- actor/dense/kernel:0 + noise
adaptive_param_noise_actor/dense/bias:0 <- actor/dense/bias:0 + noise
setting up actor optimizer
actor shapes: [[16, 64], [64], [64, 64], [64], [64, 4], [4]]
actor params: 5508
setting up critic optimizer
regularizing: critic/mlp_fc0/w:0
regularizing: critic/mlp_fc1/w:0
applying l2 regularization with 0.01
critic shapes: [[20, 64], [64], [64, 64], [64], [64, 1], [1]]
critic params: 5569
setting up target updates ...
target_actor/mlp_fc0/w:0 <- actor/mlp_fc0/w:0
target_actor/mlp_fc0/b:0 <- actor/mlp_fc0/b:0
target_actor/mlp_fc1/w:0 <- actor/mlp_fc1/w:0
target_actor/mlp_fc1/b:0 <- actor/mlp_fc1/b:0
target_actor/dense/kernel:0 <- actor/dense/kernel:0
target_actor/dense/bias:0 <- actor/dense/bias:0
setting up target updates ...
target_critic/mlp_fc0/w:0 <- critic/mlp_fc0/w:0
target_critic/mlp_fc0/b:0 <- critic/mlp_fc0/b:0
target_critic/mlp_fc1/w:0 <- critic/mlp_fc1/w:0
target_critic/mlp_fc1/b:0 <- critic/mlp_fc1/b:0
target_critic/output/kernel:0 <- critic/output/kernel:0
target_critic/output/bias:0 <- critic/output/bias:0
Using agent with the following configuration:
dict_items([('clip_norm', None), ('target_init_updates', [<tf.Operation 'group_deps_4' type=NoOp>, <tf.Operation 'group_deps_6' type=NoOp>]), ('critic_with_actor_tf', <tf.Tensor 'clip_by_value_3:0' shape=(?, 1) dtype=float32>), ('perturb_adaptive_policy_ops', <tf.Operation 'group_deps_1' type=NoOp>), ('return_range', (-inf, inf)), ('obs1', <tf.Tensor 'obs1:0' shape=(?, 16) dtype=float32>), ('perturbed_actor_tf', <tf.Tensor 'param_noise_actor/Tanh_2:0' shape=(?, 4) dtype=float32>), ('actor_tf', <tf.Tensor 'actor/Tanh_2:0' shape=(?, 4) dtype=float32>), ('memory', <baselines.ddpg.memory.Memory object at 0x7f33d5a49b00>), ('actor_optimizer', <baselines.common.mpi_adam.MpiAdam object at 0x7f33c0ad6e80>), ('normalize_observations', True), ('critic_optimizer', <baselines.common.mpi_adam.MpiAdam object at 0x7f341f6cdb70>), ('terminals1', <tf.Tensor 'terminals1:0' shape=(?, 1) dtype=float32>), ('batch_size', 64), ('actor_grads', <tf.Tensor 'concat:0' shape=(5508,) dtype=float32>), ('actor_loss', <tf.Tensor 'Neg:0' shape=() dtype=float32>), ('initial_state', None), ('stats_ops', [<tf.Tensor 'Mean_3:0' shape=() dtype=float32>, <tf.Tensor 'Mean_4:0' shape=() dtype=float32>, <tf.Tensor 'Mean_5:0' shape=() dtype=float32>, <tf.Tensor 'Sqrt_1:0' shape=() dtype=float32>, <tf.Tensor 'Mean_8:0' shape=() dtype=float32>, <tf.Tensor 'Sqrt_2:0' shape=() dtype=float32>, <tf.Tensor 'Mean_11:0' shape=() dtype=float32>, <tf.Tensor 'Sqrt_3:0' shape=() dtype=float32>, <tf.Tensor 'Mean_14:0' shape=() dtype=float32>, <tf.Tensor 'Sqrt_4:0' shape=() dtype=float32>]), ('actor', <baselines.ddpg.models.Actor object at 0x7f33c2709358>), ('stats_sample', None), ('target_Q', <tf.Tensor 'add_2:0' shape=(?, 1) dtype=float32>), ('critic', <baselines.ddpg.models.Critic object at 0x7f33c2709320>), ('param_noise_stddev', <tf.Tensor 'param_noise_stddev:0' shape=() dtype=float32>), ('action_noise', None), ('observation_range', (-5.0, 5.0)), ('target_soft_updates', [<tf.Operation 'group_deps_5' type=NoOp>, <tf.Operation 'group_deps_7' type=NoOp>]), ('critic_loss', <tf.Tensor 'add_15:0' shape=() dtype=float32>), ('target_critic', <baselines.ddpg.models.Critic object at 0x7f33c2709470>), ('stats_names', ['obs_rms_mean', 'obs_rms_std', 'reference_Q_mean', 'reference_Q_std', 'reference_actor_Q_mean', 'reference_actor_Q_std', 'reference_action_mean', 'reference_action_std', 'reference_perturbed_action_mean', 'reference_perturbed_action_std']), ('ret_rms', None), ('critic_tf', <tf.Tensor 'clip_by_value_2:0' shape=(?, 1) dtype=float32>), ('normalized_critic_with_actor_tf', <tf.Tensor 'critic_1/output/BiasAdd:0' shape=(?, 1) dtype=float32>), ('gamma', 0.99), ('action_range', (-1.0, 1.0)), ('adaptive_policy_distance', <tf.Tensor 'Sqrt:0' shape=() dtype=float32>), ('normalize_returns', False), ('reward_scale', 1.0), ('critic_target', <tf.Tensor 'critic_target:0' shape=(?, 1) dtype=float32>), ('param_noise', AdaptiveParamNoiseSpec(initial_stddev=0.2, desired_action_stddev=0.2, adoption_coefficient=1.01)), ('enable_popart', False), ('actions', <tf.Tensor 'actions:0' shape=(?, 4) dtype=float32>), ('critic_grads', <tf.Tensor 'concat_2:0' shape=(5569,) dtype=float32>), ('perturb_policy_ops', <tf.Operation 'group_deps' type=NoOp>), ('normalized_critic_tf', <tf.Tensor 'critic/output/BiasAdd:0' shape=(?, 1) dtype=float32>), ('obs_rms', <baselines.common.mpi_running_mean_std.RunningMeanStd object at 0x7f33c2709eb8>), ('actor_lr', 0.0001), ('critic_lr', 0.001), ('obs0', <tf.Tensor 'obs0:0' shape=(?, 16) dtype=float32>), ('critic_l2_reg', 0.01), ('rewards', <tf.Tensor 'rewards:0' shape=(?, 1) dtype=float32>), ('target_actor', <baselines.ddpg.models.Actor object at 0x7f33c246e940>), ('tau', 0.01)])
---------------------------------------------
| obs_rms_mean | 0.49 |
| obs_rms_std | 0.156 |
| param_noise_stddev | 0.164 |
| reference_action_mean | 0.029 |
| reference_action_std | 0.773 |
| reference_actor_Q_mean | -7.02 |
| reference_actor_Q_std | 0.745 |
| reference_perturbed_action_... | 0.033 |
| reference_perturbed_action_std | 0.781 |
| reference_Q_mean | -7.11 |
| reference_Q_std | 0.674 |
| rollout/actions_mean | 0.0536 |
| rollout/actions_std | 0.659 |
| rollout/episode_steps | 50 |
| rollout/episodes | 40 |
| rollout/Q_mean | -2.97 |
| rollout/return | -49.8 |
| rollout/return_history | -49.8 |
| rollout/return_history_std | 1.25 |
| rollout/return_std | 1.25 |
| total/duration | 12.2 |
| total/episodes | 40 |
| total/epochs | 1 |
| total/steps | 2e+03 |
| total/steps_per_second | 164 |
| train/loss_actor | 6.83 |
| train/loss_critic | 0.808 |
| train/param_noise_distance | 0.596 |
---------------------------------------------
---------------------------------------------
| obs_rms_mean | 0.502 |
| obs_rms_std | 0.146 |
| param_noise_stddev | 0.134 |
| reference_action_mean | 0.107 |
| reference_action_std | 0.784 |
| reference_actor_Q_mean | -11.5 |
| reference_actor_Q_std | 3.23 |
| reference_perturbed_action_... | 0.319 |
| reference_perturbed_action_std | 0.651 |
| reference_Q_mean | -11.8 |
| reference_Q_std | 2.89 |
| rollout/actions_mean | 0.0836 |
| rollout/actions_std | 0.686 |
| rollout/episode_steps | 50 |
| rollout/episodes | 80 |
| rollout/Q_mean | -6.57 |
| rollout/return | -49.8 |
| rollout/return_history | -49.8 |
| rollout/return_history_std | 0.972 |
| rollout/return_std | 0.972 |
| total/duration | 22.9 |
| total/episodes | 80 |
| total/epochs | 2 |
| total/steps | 4e+03 |
| total/steps_per_second | 175 |
| train/loss_actor | 12 |
| train/loss_critic | 1.98 |
| train/param_noise_distance | 0.326 |
---------------------------------------------
Running trained model
Creating window glfw
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
episode_rew=-50.0
It's expected that vanilla PeNFAC can't easily solve this task because of the sparse rewards (as well as vanilla ddpg, vanilla PPO, etc.).
I developed a "data augmentation" module for PeNFAC similar to HER (even if we can't talk of off-policy replay here). See 426b203.
If you want to use it you need to:
- change your config.ini to use "libddrl-hpenfac.so" instead of "libddrl-penfac.so"
- add the command line argument "--goal-based" when you call python run.py
- add the hyperparameter "hindsight_nb_destination=5" to [agent] section in config.ini
Here's preliminary results on the environment you tried:
config.ini:
...
[agent]
gamma=0.98
decision_each=1
#policy
noise=0.2
gaussian_policy=1
hidden_unit_v=64:64
hidden_unit_a=64:64
momentum=0
actor_output_layer_type=2
hidden_layer_type=1
#learning
alpha_a=0.0001
alpha_v=0.001
batch_norm_actor=7
batch_norm_critic=0
update_critic_first=true
number_fitted_iteration=10
stoch_iter_critic=1
lambda=0.9
gae=true
update_each_episode=3
stoch_iter_actor=1
beta_target=0.03
ignore_poss_ac=false
conserve_beta=true
disable_cac=false
disable_trust_region=true
hindsight_nb_destination=5
OK, What do you suggest us to do next?
Use success_rate or last reward to compare?
@matthieu637
I modified the gym/run.py
file to output the success_rate like this:
while sample_steps_counter < total_max_steps + testing_each * max_steps:
if episode % display_log_each == 0:
success_rate = (results[-1]+max_steps)/max_steps if len(results) > 0 else 0
n_epoch = episode // display_log_each
print('n_epoch', n_epoch, 'success rate', success_rate)
writer.add_scalar(env_name+'success_rate_hpenfac', success_rate, n_epoch+1)
print('episode', episode, 'total steps', sample_steps_counter, 'last perf', results[-1] if len(results) > 0 else 0)
And the comparison of success_rate between ddpg+her and hpenfac with the original hyperparameters:
:~$ python -m baselines.run --alg=her --env=FetchPush-v1 --num_timesteps=2.5e6
:~$ python run.py --goal-based
Here I used 2.5e6 total_max_steps and config.ini like this:
[simulation]
total_max_steps=2500000
testing_each=10
#number of trajectories for testing
testing_trials=10
dump_log_each=50
display_log_each=100
save_agent_each=100000
library=/home/jim/ddrl/agent/cacla/lib/libddrl-hpenfac.so
; env_name=RoboschoolHalfCheetah-v1
; env_name=HalfCheetah-v2
env_name=FetchPush-v1
[agent]
gamma=0.98
decision_each=1
#policy
noise=0.2
gaussian_policy=1
hidden_unit_v=64:64
hidden_unit_a=64:64
momentum=0
actor_output_layer_type=2
hidden_layer_type=1
#learning
alpha_a=0.0001
alpha_v=0.001
batch_norm_actor=7
batch_norm_critic=0
reward_scale=1.0
vnn_from_scratch=false
update_critic_first=true
number_fitted_iteration=10
stoch_iter_critic=1
lambda=0.9
gae=true
update_each_episode=3
stoch_iter_actor=1
beta_target=0.03
ignore_poss_ac=false
conserve_beta=true
disable_cac=false
disable_trust_region=true
hindsight_nb_destination=5
I prepare to use the hyperparameters from ddpg+her like here in sec.2.2, can you share me some tips to modify your code to use the same hyperparameters?
On reach:
[simulation]
total_max_steps=2500000
testing_each=10
#number of trajectories for testing
testing_trials=10
dump_log_each=50
display_log_each=100
save_agent_each=100000
library=/home/jim/ddrl/agent/cacla/lib/libddrl-hpenfac.so
env_name=FetchReach-v1
[agent]
gamma=0.98
decision_each=1
#policy
noise=0.2
gaussian_policy=1
hidden_unit_v=64:64
hidden_unit_a=64:64
momentum=0
actor_output_layer_type=2
hidden_layer_type=1
#learning
alpha_a=0.0001
alpha_v=0.001
batch_norm_actor=7
batch_norm_critic=0
reward_scale=1.0
vnn_from_scratch=false
update_critic_first=true
number_fitted_iteration=10
stoch_iter_critic=1
lambda=0.9
gae=true
update_each_episode=3
stoch_iter_actor=1
beta_target=0.03
ignore_poss_ac=false
conserve_beta=true
disable_cac=false
disable_trust_region=true
hindsight_nb_destination=5
:~$ python -m baselines.run --alg=her --env=FetchReach-v1 --num_timesteps=2.5e6
For PeNFAC, you're computing a success rate equivalent to "how many times I reached the goal within one episode", whereas in HER it only checks "if the goal was reached at the end of the episode".
The two curves are not comparable: PeNFAC is penalized since the intermediate steps count as failures.
@matthieu637
Did you mean I can use the last perf from every test_episode to calculate the success_rate?
@matthieu637
I use this python script to plot the success_rate from 0.1.monitor.csv
:
import numpy as np
from numpy import genfromtxt
import matplotlib.pyplot as plt
import matplotlib
matplotlib.rcParams.update({'font.size': 12})
plt.rcParams["font.family"] = "Time New Roman"
episodes = 800
epochs = 200
env = 'FetchPush-v1'
data = genfromtxt('0.1.monitor.csv', delimiter=',')
is_success = data[:,3][1:len(data)]
to_epoch = is_success.reshape(epochs,episodes)
x,y =[],[]
for epoch, last_prefs in enumerate(to_epoch):
success_rate = np.sum(last_prefs) / episodes
x = x + [epoch]
y = y + [success_rate]
plt.figure(figsize=(15,10))
plt.plot(x, y, marker='o', linestyle='-', markersize=2, linewidth=1, label='hpenfac')
plt.xlabel('n_epoch')
plt.ylabel('success rate')
plt.title(env)
plt.legend(loc=2)
plt.savefig(env+'.png')
plt.show()
The performance with the 64:64 like this:
hyperparameters:
[simulation]
total_max_steps=8000000
testing_each=10
#number of trajectories for testing
testing_trials=10
dump_log_each=50
display_log_each=100
save_agent_each=100000
library=/home/jim/ddrl/agent/cacla/lib/libddrl-hpenfac.so
env_name=FetchPush-v1
[agent]
gamma=0.98
decision_each=1
#policy
noise=0.2
gaussian_policy=1
hidden_unit_v=64:64
hidden_unit_a=64:64
momentum=0
actor_output_layer_type=2
hidden_layer_type=1
#learning
alpha_a=0.0001
alpha_v=0.001
batch_norm_actor=7
batch_norm_critic=0
reward_scale=1.0
vnn_from_scratch=false
update_critic_first=true
number_fitted_iteration=10
stoch_iter_critic=1
lambda=0.9
gae=true
update_each_episode=3
stoch_iter_actor=1
beta_target=0.03
ignore_poss_ac=false
conserve_beta=true
disable_cac=false
disable_trust_region=true
hindsight_nb_destination=5
The performance with 256:256:256 like this:
hyperparameters:
[simulation]
total_max_steps=8000000
testing_each=10
#number of trajectories for testing
testing_trials=10
dump_log_each=50
display_log_each=100
save_agent_each=100000
library=../agent/cacla/lib/libddrl-hpenfac.so
env_name=FetchPush-v1
[agent]
gamma=0.98
decision_each=1
#policy
noise=0.2
gaussian_policy=1
hidden_unit_v=256:256:256
hidden_unit_a=256:256:256
momentum=0
actor_output_layer_type=2
hidden_layer_type=3
#learning
alpha_a=0.001
alpha_v=0.001
batch_norm_actor=7
batch_norm_critic=0
reward_scale=1.0
vnn_from_scratch=false
update_critic_first=true
number_fitted_iteration=10
stoch_iter_critic=1
lambda=0.9
gae=true
update_each_episode=3
stoch_iter_actor=1
beta_target=0.03
ignore_poss_ac=false
conserve_beta=true
disable_cac=false
disable_trust_region=true
hindsight_nb_destination=5
My bad I'm talking about FetchReach-v1.
For Reach I guess you have to start to optimize the hyperparameters.
Thx @matthieu637,
I can use DDPG+HER to see the performance in FetchReach later.
But they haven't compared with FetchReach in paper(https://arxiv.org/abs/1707.01495), I think hard to compare because its performance changed a lot and faster go to 100% success rate.
I don't understand what you mean in the second sentence. I am manually to change the hyperparameters for FetchPush, I study how to use lhpo. But I worrying I can not finish the hyper-optimize before the deadline.
@matthieu637
For the FetchReach-v1 task with DDPG+HER, it is supper outpreformance, it only need 4epochsx10episodex50timesteps = 2000 total_timesteps can have 100% test_success_rate
jim@jim-Inspiron-7577:~/baselines $ python -m baselines.run --alg=her --env=FetchReach-v1 --num_timesteps=8e5 --n_cycles=10
/home/jim/anaconda2/envs/clustering/lib/python3.5/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.24.2) or chardet (3.0.4) doesn't match a supported version!
RequestsDependencyWarning)
Logging to /tmp/openai-2019-06-28-20-56-42-334498
env_type: robotics
2019-06-28 20:56:43.049827: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2019-06-28 20:56:43.051216: E tensorflow/stream_executor/cuda/cuda_driver.cc:406] failed call to cuInit: CUDA_ERROR_UNKNOWN
2019-06-28 20:56:43.051249: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: jim-Inspiron-7577
2019-06-28 20:56:43.051259: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: jim-Inspiron-7577
2019-06-28 20:56:43.051294: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: 410.78.0
2019-06-28 20:56:43.051319: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:369] driver version file contents: """NVRM version: NVIDIA UNIX x86_64 Kernel Module 410.78 Sat Nov 10 22:09:04 CST 2018
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.11)
"""
2019-06-28 20:56:43.051335: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: 410.78.0
2019-06-28 20:56:43.051343: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:300] kernel version seems to match DSO: 410.78.0
Training her on robotics:FetchReach-v1 with arguments
{'network': 'mlp', 'n_cycles': 10}
T: 50
_Q_lr: 0.001
_action_l2: 1.0
_batch_size: 256
_buffer_size: 1000000
_clip_obs: 200.0
_hidden: 256
_layers: 3
_max_u: 1.0
_network_class: baselines.her.actor_critic:ActorCritic
_norm_clip: 5
_norm_eps: 0.01
_pi_lr: 0.001
_polyak: 0.95
_relative_goals: False
_scope: ddpg
aux_loss_weight: 0.0078
bc_loss: 0
ddpg_params: {'batch_size': 256, 'max_u': 1.0, 'action_l2': 1.0, 'network_class': 'baselines.her.actor_critic:ActorCritic', 'norm_clip': 5, 'polyak': 0.95, 'buffer_size': 1000000, 'layers': 3, 'clip_obs': 200.0, 'scope': 'ddpg', 'norm_eps': 0.01, 'hidden': 256, 'relative_goals': False, 'pi_lr': 0.001, 'Q_lr': 0.001}
demo_batch_size: 128
env_name: FetchReach-v1
gamma: 0.98
make_env: <function prepare_params.<locals>.make_env at 0x7f12f5fac510>
n_batches: 40
n_cycles: 10
n_test_rollouts: 10
noise_eps: 0.2
num_demo: 100
prm_loss_weight: 0.001
q_filter: 0
random_eps: 0.3
replay_k: 4
replay_strategy: future
rollout_batch_size: 1
test_with_polyak: False
*** Warning ***
You are running HER with just a single MPI worker. This will work, but the experiments that we report in Plappert et al. (2018, https://arxiv.org/abs/1802.09464) were obtained with --num_cpu 19. This makes a significant difference and if you are looking to reproduce those results, be aware of this. Please also refer to https://github.com/openai/baselines/issues/314 for further details.
****************
Creating a DDPG agent with action space 4 x 1.0...
Training...
---------------------------------
| epoch | 0 |
| stats_g/mean | 0.914 |
| stats_g/std | 0.107 |
| stats_o/mean | 0.271 |
| stats_o/std | 0.0339 |
| test/episode | 10 |
| test/mean_Q | -0.356 |
| test/success_rate | 0.6 |
| train/episode | 10 |
| train/success_rate | 0 |
---------------------------------
---------------------------------
| epoch | 1 |
| stats_g/mean | 0.885 |
| stats_g/std | 0.112 |
| stats_o/mean | 0.264 |
| stats_o/std | 0.0351 |
| test/episode | 20 |
| test/mean_Q | -1.02 |
| test/success_rate | 0.7 |
| train/episode | 20 |
| train/success_rate | 0.7 |
---------------------------------
---------------------------------
| epoch | 2 |
| stats_g/mean | 0.881 |
| stats_g/std | 0.11 |
| stats_o/mean | 0.263 |
| stats_o/std | 0.035 |
| test/episode | 30 |
| test/mean_Q | -0.579 |
| test/success_rate | 1 |
| train/episode | 30 |
| train/success_rate | 0.8 |
---------------------------------
---------------------------------
| epoch | 3 |
| stats_g/mean | 0.874 |
| stats_g/std | 0.107 |
| stats_o/mean | 0.261 |
| stats_o/std | 0.0343 |
| test/episode | 40 |
| test/mean_Q | -0.553 |
| test/success_rate | 1 |
| train/episode | 40 |
| train/success_rate | 0.8 |
---------------------------------
---------------------------------
| epoch | 4 |
| stats_g/mean | 0.874 |
| stats_g/std | 0.103 |
| stats_o/mean | 0.261 |
| stats_o/std | 0.0335 |
| test/episode | 50 |
| test/mean_Q | -0.526 |
| test/success_rate | 1 |
| train/episode | 50 |
| train/success_rate | 1 |
---------------------------------
Fixed in last commit.
Produced without hyperoptimization params:
[simulation]
total_max_steps = 2000000
testing_each = 100
testing_trials = 40
dump_log_each = 1
display_log_each = 200
save_agent_each = 10000000
library = ..../ddrl/agent/cacla/lib/libddrl-hpenfac.so
env_name=FetchPush-v1
[agent]
gamma = 0.98
noise = 0.35
gaussian_policy = 1
hidden_unit_v = 256:256:256
hidden_unit_a = 256:256:256
actor_output_layer_type = 2
hidden_layer_type = 3
alpha_a = 0.0005
alpha_v = 0.001
number_fitted_iteration = 10
stoch_iter_critic = 1
lambda = 0.6
gae = true
update_each_episode = 40
stoch_iter_actor = 10
beta_target = 0.03
ignore_poss_ac = false
conserve_beta = true
disable_cac = false
disable_trust_region = true
hindsight_nb_destination = 3
@matthieu637
Hi, Mat.
Do you remember that which trick you used that make Hindsight Augmentation work with PeNFAC?