seungeunrho/minimalRL
Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)
PythonMIT
Issues
- 1
Add meta RL algorithms?
#49 opened - 0
Wrong formula for calc-target in SAC?
#63 opened by BeFranke - 1
Training speed is very slow!!!
#62 opened by xuzhou666 - 1
TypeError: expected np.ndarray (got tuple)
#59 opened by InguChoi - 0
DQN why train iterate for 10 times
#57 opened by FeynmanDNA - 0
MuZero minimal implementation
#56 opened by ipsec - 2
Add minimal IMPALA?
#51 opened by meadewaking - 5
The ratio in ppo.py should be detach() ?
#33 opened by dedekinds - 1
Remove unused import
#43 opened by neal2018 - 0
Minimal way to save / replay trained model?
#52 opened by HanClinto - 3
- 7
Add new algorithms
#11 opened by rahulptel - 0
Query about LSTM
#50 opened by npitsillos - 0
- 0
- 1
cartpole ppo train , reward drop
#42 opened by SeungyounShin - 1
Maybe a bug in SAC Implementation?
#40 opened by arthur-x - 0
- 2
Please add 1 continuous env
#6 opened by bionicles - 2
PPO Continuous Action Space
#12 opened by raunakdoesdev - 1
Soft Actor Critic?
#38 opened by EmmanuelMess - 3
Missing done mask?
#32 opened by Junyoungpark - 1
PPO update mistake?
#36 opened by zcaicaros - 1
Questions about A3C
#29 opened by LoveRL - 2
RuntimeError while running DDPG.py
#34 opened by rl-max - 2
TD3: Twin Delayed DDPG
#37 opened by zcaicaros - 0
torch.gather in relevant to policy gradient
#31 opened by migom6 - 0
PPO has no entropy factor
#30 opened by CesMak - 1
- 5
Problem of `train_net()` in REINFORCE algorithm.
#26 opened by fuyw - 0
Add SAC?
#19 opened by banma12956 - 0
- 1
LSTM + PPO value fitting
#17 opened by hnshahao - 1
- 1
Improper asynchronous update in a3c
#9 opened by rahulptel - 1
- 1
Typo of actor_critic.py?
#7 opened by seungwonpark - 1
Use maxlen in deque initializer
#3 opened by jwergieluk - 1
- 1