hill-a/stable-baselines
A fork of OpenAI Baselines, implementations of reinforcement learning algorithms
PythonMIT
Pinned issues
Issues
- 0
Using Multiple environment with unity ML agents
#1193 opened by 871234342 - 4
- 2
MlpPolicy network output layer softmax activation for continuous action space problem?
#1190 opened by wbzhang233 - 1
Resume Training on Checkpoint
#1188 opened by mingjohnson - 0
resume the training
#1192 opened by rambo1111 - 1
- 1
Model generated using VecNormalize, model predict use case
#1189 opened by madhekar - 0
VecNormalize 'training' attribute
#1187 opened by madhekar - 0
Customize training process
#1186 opened by dunhuiliu - 0
assertionerror, observation space , vectransposeimage
#1185 opened by skgr07 - 0
Dimension mismatch with when using custom Feature Extractor
#1184 opened by yassinetb - 8
[question] How to get the model architecture when using recurrent policy?
#1145 opened by borninfreedom - 0
[Question] Can I train agents in a nested loop in SB3?
#1183 opened by j-thib - 0
python setup.py egg_info did not run successfully.
#1182 opened by perp1exed - 0
[question] TypeError: 'NoneType' object is not callable with user defined env
#1181 opened by Charles-Lim93 - 0
DQN report. [QUESTION]
#1180 opened by smbrine - 0
#question _on_step method in custom callback
#1179 opened by vrige - 0
Can I use an agent with act, and observe interactions with no/minimum use of environment?
#1178 opened by aheidariiiiii1993 - 1
- 0
- 1
[question] PPO load using .pkl file
#1176 opened by meric-sakarya - 0
- 0
- 0
[question] for an RL algorithm with a discrete action space, is it possible to get a probability of outcomes when feeding in data?
#1171 opened by george-adams1 - 1
[Question]Callback collected model does not have same reward as training verbose[custom gym environment]
#1170 opened by hotpotking-lol - 2
Store the training result
#1169 opened by hotpotking-lol - 2
Environment checker returns assertion error contradicting debug statements
#1168 opened by techboy-coder - 6
True rewards remaining "zero" in the trajectories in stable baselines2 for custom environments
#1167 opened by moizuet - 2
Deep Q-value network evaluation in SAC algorithm
#1166 opened by moizuet - 1
SAC results with large variance
#1150 opened by dibbla - 0
- 3
1D Vector of floats as an observation space
#1164 opened by WilliamFlinchbaugh - 2
- 1
- 4
FPS varies enormously
#1161 opened by leo2r - 4
Cannot install stable baselines 3
#1160 opened by OishikGuha - 2
Accessing observations during training aka .learn()
#1159 opened by user-1701 - 1
Is it wrong to reward an action on the next step?
#1158 opened by DaniilKardava - 1
Data normalization for a2c inputs?
#1157 opened by DaniilKardava - 1
Prediction for same observation using same model
#1156 opened by DaniilKardava - 3
Invalid Actions, Mask and DQN
#1155 opened by Cyazd - 1
Problem retraining PPO1 model and using Tensorflow with Stable Baselines 2
#1154 opened by durantagre - 1
Minigrid --" Kernel size can't be greater than actual input size" for DQN
#1153 opened by raymond2338 - 1
Running Stable Baselines on M1 Macs?
#1152 opened by adamnhaka - 7
Unable to see stable-baselines output
#1151 opened by Michael-HK - 3
- 4
[question] How do I load a tensorflow ckpt?
#1147 opened by Syzygianinfern0 - 1
[question] How to use previously obtained state-action-reward-next state information to save time on training?
#1146 opened by kwak9601 - 1
import stable-baselines [Question]
#1144 opened by MariaPiaGelos - 1
PPO ValueError: The parameter loc has invalid values
#1143 opened by olyanos