BipedalWalker-v2-ddpg: A Jupyter Notebook repository from mauicv

Bipedal Walker OpenAI gym Reinforcement Learning solution.

List of stupid mistakes made throughout implementing this algorithm.

Always check numpy array shapes. Specifically that you haven't broadcast a (64) dimension array over a (64, 1) dimension array! 🤦
Check every variable. Spent ages trying to figure out why nothing was being learnt only to discover instead of returning states and next_states from the memory buffer sample I was instead just returning states and states! 🤦
Copied and pasted the actor network while building the critic and accidentally forgot to remove the tanh activation meaning the critic could at most predict a total of reward 1 or -1 for the entire episode given any state and action pair! 🤦
Left the hard-coded high action bound in from training the pendulum environment as a default when initializing the actor model. Correctly adjusted it for the actor on the agent class but not the target actor meaning the target actor would always output 2 times the action the actor would! 🤦