seungeunrho/minimalRL

Implementations of basic RL algorithms with minimal lines of codes! (pytorch based)

PythonMIT

Issues

Add meta RL algorithms?
#49 opened 4 years ago
1
Wrong formula for calc-target in SAC?
#63 opened 5 months ago by BeFranke
0
Training speed is very slow！！！
#62 opened a year ago by xuzhou666
1
TypeError: expected np.ndarray (got tuple)
#59 opened 2 years ago by InguChoi
1
DQN why train iterate for 10 times
#57 opened 3 years ago by FeynmanDNA
0
MuZero minimal implementation
#56 opened 3 years ago by ipsec
0
Add minimal IMPALA？
#51 opened 3 years ago by meadewaking
2
The ratio in ppo.py should be detach() ?
#33 opened 3 years ago by dedekinds
5
Remove unused import
#43 opened 3 years ago by neal2018
1
Minimal way to save / replay trained model?
#52 opened 3 years ago by HanClinto
0
Cartpole environment with Multidiscrete action space
#47 opened 4 years ago by mgazzin
3
Add new algorithms
#11 opened 5 years ago by rahulptel
7
Query about LSTM
#50 opened 4 years ago by npitsillos
0
Cartpole environment with Multidiscrete action space
#48 opened 4 years ago by mg64ve
0
A naive question about updating parameters in DDPG.
#46 opened 4 years ago by HiddenBeginner
0
cartpole ppo train , reward drop
#42 opened 4 years ago by SeungyounShin
1
Maybe a bug in SAC Implementation?
#40 opened 4 years ago by arthur-x
1
Use pytorch-lightning for better readability and optimization
#39 opened 4 years ago by EmmanuelMess
0
Please add 1 continuous env
#6 opened 4 years ago by bionicles
2
PPO Continuous Action Space
#12 opened 4 years ago by raunakdoesdev
2
Soft Actor Critic?
#38 opened 4 years ago by EmmanuelMess
1
Missing done mask?
#32 opened 4 years ago by Junyoungpark
3
PPO update mistake?
#36 opened 4 years ago by zcaicaros
1
Questions about A3C
#29 opened 4 years ago by LoveRL
1
RuntimeError while running DDPG.py
#34 opened 4 years ago by rl-max
2
TD3: Twin Delayed DDPG
#37 opened 4 years ago by zcaicaros
2
torch.gather in relevant to policy gradient
#31 opened 5 years ago by migom6
0
PPO has no entropy factor
#30 opened 5 years ago by CesMak
0
Termination of a CartPole episode in REINFORCE.py
#28 opened 5 years ago by ansari1375
1
Problem of `train_net()` in REINFORCE algorithm.
#26 opened 5 years ago by fuyw
5
Add SAC?
#19 opened 5 years ago by banma12956
0
TF2 implementation for Policy Gradient Reinforce
#18 opened 5 years ago by dragen1860
0
LSTM + PPO value fitting
#17 opened 5 years ago by hnshahao
1
Wrong gradient flow in bias correction term of ACER?
#15 opened 5 years ago by wwiiiii
1
Improper asynchronous update in a3c
#9 opened 5 years ago by rahulptel
1
Wrong td_target and test() call in a3c implementation
#8 opened 5 years ago by rahulptel
1
Typo of actor_critic.py?
#7 opened 6 years ago by seungwonpark
1
Use maxlen in deque initializer
#3 opened 6 years ago by jwergieluk
1
train() overwrites the base method of nn.Module
#2 opened 6 years ago by NikEyX
1
Reinforce implementation looks to use old data without importance sampling
#1 opened 6 years ago by sritee
1