- Learning about Batch Gradient Descent: http://ruder.io/optimizing-gradient-descent/index.html#
- https://towardsdatascience.com/gradient-descent-in-python-a0d07285742f
- Analogy with climbing down mountain:
- Size of Steps taken in any direction = Learning rate
- Gadget telling height = Cost function
- Direction of your steps = Gradients
- Cost function to calculate cost
- Gradient descent function to calculate new Theta vector