DLR-RM/stable-baselines3
PyTorch version of Stable Baselines, reliable implementations of reinforcement learning algorithms.
PythonMIT
Issues
- 0
- 4
- 3
What does the output of model.learn mean?
#1934 opened by LeZhengThu - 3
[Feature Request] Allow users to define gradient steps as a fraction of rollout time-steps
#1920 opened by janakact - 2
[Question] Running Multi-threaded PPO training independently with no interference
#1931 opened by n-kish - 5
Setting up seed in Custom Gym environment
#1932 opened by Chainesh - 1
[Question] SAC, a torch model becomes a bool somehow
#1930 opened by JaimeParker - 3
- 1
[Question] A error while using SAC and DDPG
#1923 opened by minxuef - 2
- 4
[Question] Why torch model in c++ got totally different output from python
#1925 opened by JaimeParker - 8
SubprocVecEnv Sets Out-of-Range Seeds for My Environments (ScenarioNet Enviroment)
#1921 opened by chrisgao99 - 1
- 1
- 5
SAC model not properly saved
#1916 opened by PabloVD - 6
- 6
Scaling Environment
#1907 opened by Hamza-101 - 5
[Bug]: evaluate_policy called multiple times vor vectorized environments
#1912 opened by LukasFehring - 3
- 2
- 6
Handing mission space in Babyai env
#1914 opened by Chainesh - 9
[Bug]: Scaling Environment
#1906 opened by Hamza-101 - 8
[Bug]: Load Trained Policy
#1911 opened by zlw21gxy - 3
- 2
[Question] policy gradient loss and explained variance very small (almost zero) from the training start?
#1897 opened by Ahmed-Radwan094 - 2
[Question] Saving PPO rollout buffer on GPU
#1891 opened by Ahmed-Radwan094 - 2
[Question] CheckpointCallback keep last K
#1893 opened by NickLucche - 2
Scalability
#1905 opened by Hamza-101 - 1
[Question] How to avoid SAC to stuck in local minima
#1903 opened by JaimeParker - 4
[Bug]: if learning_rate function uses special types, they can cause torch.load to fail when weights_only=True
#1900 opened by markscsmith - 4
[Question] Discontinuous reward training curve
#1898 opened by JaimeParker - 1
Why does the Logger only return the train/ metrics, and not eval/, time/, and rollout/?
#1888 opened by liamquantrill - 2
Why does VecFrameStack clear the prior frames in the stack for the step when "terminated=True"?
#1883 opened by wkwan - 1
[Bug]: EOFError after running for some steps
#1890 opened by GeorgeWuzy - 3
[Feature Request] Enable predict to take tensor as input
#1896 opened by llewynS - 2
Off policy algorithm policy_kwargs
#1895 opened by suargi - 2
[Bug]: Potential Bug in PPO? Clarification requested
#1894 opened by azrael417 - 5
Exporting MultiInputActorCriticPolicy as ONNX
#1873 opened by MaximCamilleri - 2
Issue(HER with in SAC algorithm)
#1892 opened by wadeKeith - 8
- 1
[Question] Discretize continuous actions/observations ?
#1887 opened by nrigol - 2
[Question] influence of buffer size when using vecenv and save customized replay buffer
#1885 opened by JaimeParker - 2
How to elegantly modify an algorithm by adding new architectures trained with custom losses?
#1881 opened by jamesheald - 1
[Question] [Multiprocessing] RolloutBuffer groups environment transitions on a per-environment basis.
#1880 opened by N00bcak - 0
[Question] Control PPO training
#1872 opened by mwalidcharrwi - 1
[Question] Action masking for a DQN Agent
#1876 opened by Tim1605 - 7
[Question] Training PPO model with single step episodes
#1874 opened by oshadajay - 4
[Feature Request] Resume trained model with set_parameters without reset_num_timesteps
#1877 opened by tanielsfranklin - 1
How does stable-baselines work with a multi-agent pettingzoo environment?
#1878 opened by AnastasiaPsarou - 1
[Question] Changes in observations
#1875 opened by d505