dennybritz/reinforcement-learning
Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.
Jupyter NotebookMIT
Issues
- 2
MC Control with Epsilon-Greedy Policies ---Epsilon Value and Best Action prob error
#252 opened by hardik-kansal - 1
Gambler's Problem: 0 Stake Allowed?
#223 opened by mparigi - 12
Questionable result in Gamblers Problem Solution
#172 opened by bminixhofer - 1
Reinforcement learning policy
#238 opened by Comp-Engr18 - 0
- 2
Vanilla REINFORCE implementation
#200 opened by alek5k - 0
- 0
- 0
Typo in: "Model-Free Prediction & Control with Monte Carlo (MC)" section -> "Blackjack Playground.ipynb" file:
#244 opened by Almujtaba-Yaseen - 0
- 1
- 3
- 0
Minor Link fix
#239 opened by gitDawn - 1
Deep Q Learning, neither works with tensorflow 1.x nor with tensorflow 2.x
#217 opened by azharsalman - 1
DQN Testing Rewards on Atari Games
#236 opened by willtop - 0
Clarification on DQN testing rewards on Atari games
#235 opened by willtop - 0
Lecture Slides need an update
#232 opened by harsh306 - 1
Randomness in optimal epsilon_greedy_policy
#196 opened by levindabhi - 0
Is the Implementation correct?
#170 opened by Nerdyvedi - 0
Monte Carlo AssertionError: defaultdict(<function mc_control_importance_sampling.<locals>.<lambda> at 0x7f31699ffe18>, {}) (<class 'collections.defaultdict'>)
#231 opened by NC25 - 1
Policy Evaluation Exercise Solution Is Wrong
#229 opened by ugrkm - 0
DQL size error
#227 opened by johan606303 - 2
- 2
- 1
Is a line missing in 'MC Control with Epsilon-Greedy Policies Solution.ipynb'?
#220 opened by Ritz111 - 2
- 2
Provided policy_improvement() solution initializes values to zero for each iteration
#204 opened by link2xt - 0
why DQN use kernel size 8 ?
#222 opened by opentld - 2
Why is Chapter 11 excluded?
#221 opened by BedirT - 6
Can an agent learn valid actions offline, being able to choose only actions that were already taken (e.g. from historical data) ? [question]
#218 opened by VieVaWaldi - 1
Could anyone show me reason why use 4 same grayscale frames when training DQN?
#213 opened by roachsinai - 0
log
#208 opened by Mahsa-Bastankhah - 2
OSError: [Errno 12] Cannot allocate memory
#173 opened by VictorLeeLk - 1
- 0
- 1
Blackjack - Monte Carlo Prediction
#182 opened by rahulptel - 2
Unstable reinforce with baseline model
#192 opened by Jacobi93 - 0
feed action to critic network
#190 opened by ehsaneshaghi - 0
How to restore model
#189 opened by tdr1991 - 0
- 0
- 1
DQN Dense Tensor Using too Much Memory
#183 opened by nflu - 0
You don't follow the book?
#185 opened by alexmosc - 0
Define an envirement
#179 opened by ewtrends - 6
policy evaluation algorithm and implementation bug
#177 opened by hamifthi - 5
[bug] DQN/dqn.py: Incorrect loss function. [question] Question about RMSProp paramethers
#174 opened by Kropekk - 1
- 0
Continuous MountainCar Actor Critic issue
#168 opened by zhouPengF - 0
Policy Gradient, when action space is 40, how can I sample action from Gaussian?
#167 opened by GoingMyWay - 0
eval() throws an error
#161 opened by lechatthecat