dennybritz/reinforcement-learning

Implementation of Reinforcement Learning Algorithms. Python, OpenAI Gym, Tensorflow. Exercises and Solutions to accompany Sutton's Book and David Silver's course.

Jupyter NotebookMIT

Issues

MC Control with Epsilon-Greedy Policies ---Epsilon Value and Best Action prob error
#252 opened a year ago by hardik-kansal
2
Gambler's Problem: 0 Stake Allowed?
#223 opened 5 years ago by mparigi
1
Questionable result in Gamblers Problem Solution
#172 opened 6 years ago by bminixhofer
12
Reinforcement learning policy
#238 opened 3 years ago by Comp-Engr18
1
demystifying-deep-reinforcement-learning link is broken
#250 opened 2 years ago by kiankyars
0
Vanilla REINFORCE implementation
#200 opened 6 years ago by alek5k
2
please provide requirements.txt or mention the exact version of packages used.
#247 opened 2 years ago by Nahdus
0
Issue in: reinforcement-learning/MC/MC Prediction Solution.ipynb
#246 opened 2 years ago by Almujtaba-Yaseen
0
Typo in: "Model-Free Prediction & Control with Monte Carlo (MC)" section -> "Blackjack Playground.ipynb" file:
#244 opened 2 years ago by Almujtaba-Yaseen
0
A small correction in "MDPs and Bellman Equations" section
#243 opened 2 years ago by Almujtaba-Yaseen
0
Batch update for Continuous Mountain Car Actor-Critic
#180 opened 6 years ago by GoingMyWay
1
Policy Gradient Methods: Loss function of policy estimator in REINFORCE
#181 opened 3 years ago by ArikVoronov
3
Minor Link fix
#239 opened 3 years ago by gitDawn
0
Deep Q Learning, neither works with tensorflow 1.x nor with tensorflow 2.x
#217 opened 5 years ago by azharsalman
1
DQN Testing Rewards on Atari Games
#236 opened 4 years ago by willtop
1
Clarification on DQN testing rewards on Atari games
#235 opened 4 years ago by willtop
0
Lecture Slides need an update
#232 opened 4 years ago by harsh306
0
Randomness in optimal epsilon_greedy_policy
#196 opened 4 years ago by levindabhi
1
Is the Implementation correct?
#170 opened 4 years ago by Nerdyvedi
0
Monte Carlo AssertionError: defaultdict(<function mc_control_importance_sampling.<locals>.<lambda> at 0x7f31699ffe18>, {}) (<class 'collections.defaultdict'>)
#231 opened 4 years ago by NC25
0
Policy Evaluation Exercise Solution Is Wrong
#229 opened 4 years ago by ugrkm
1
DQL size error
#227 opened 5 years ago by johan606303
0
Some question in MC Control with Epsilon-Greedy Policies Solution.ipynb
#224 opened 5 years ago by josephbak
2
Why CliffWalkingEnv returns 'is_done=True' when reaching cliff?
#219 opened 5 years ago by wakamori
2
Is a line missing in 'MC Control with Epsilon-Greedy Policies Solution.ipynb'?
#220 opened 5 years ago by Ritz111
1
Policy iteration solution only show 1 optimal solution
#212 opened 5 years ago by duongnhatthang
2
Provided policy_improvement() solution initializes values to zero for each iteration
#204 opened 6 years ago by link2xt
2
why DQN use kernel size 8 ?
#222 opened 5 years ago by opentld
0
Why is Chapter 11 excluded?
#221 opened 5 years ago by BedirT
2
Can an agent learn valid actions offline, being able to choose only actions that were already taken (e.g. from historical data) ? [question]
#218 opened 5 years ago by VieVaWaldi
6
Could anyone show me reason why use 4 same grayscale frames when training DQN?
#213 opened 5 years ago by roachsinai
1
log
#208 opened 5 years ago by Mahsa-Bastankhah
0
OSError: [Errno 12] Cannot allocate memory
#173 opened 6 years ago by VictorLeeLk
2
Provided policy_improvement() solution is not guaranteed to terminate
#203 opened 6 years ago by link2xt
1
policy_improvement() should be renamed to policy_iteration()
#202 opened 6 years ago by link2xt
0
Blackjack - Monte Carlo Prediction
#182 opened 6 years ago by rahulptel
1
Unstable reinforce with baseline model
#192 opened 6 years ago by Jacobi93
2
feed action to critic network
#190 opened 6 years ago by ehsaneshaghi
0
How to restore model
#189 opened 6 years ago by tdr1991
0
The output layer should not using RELU activation function.
#186 opened 6 years ago by wanjunhong0
0
Why RBFSampler from sklearn is used as the feature in the FA example?
#184 opened 6 years ago by zyongxu
0
DQN Dense Tensor Using too Much Memory
#183 opened 6 years ago by nflu
1
You don't follow the book?
#185 opened 6 years ago by alexmosc
0
Define an envirement
#179 opened 6 years ago by ewtrends
0
policy evaluation algorithm and implementation bug
#177 opened 6 years ago by hamifthi
6
[bug] DQN/dqn.py: Incorrect loss function. [question] Question about RMSProp paramethers
#174 opened 6 years ago by Kropekk
5
Policy Evaluation Solution VS Sutton's Page 75
#171 opened 6 years ago by benjamintanweihao
1
Continuous MountainCar Actor Critic issue
#168 opened 7 years ago by zhouPengF
0
Policy Gradient, when action space is 40, how can I sample action from Gaussian?
#167 opened 7 years ago by GoingMyWay
0
eval() throws an error
#161 opened 7 years ago by lechatthecat
0