- Understand that derivatives are the instantaneous rate of change of a function
- Understand how to calculate a derivative of a straight line
In the lesson discussing step sizes of our gradient descent algorithm, we filled in some more information on how to find "best fit" regression line by using gradient descent. Namely, we learned how to efficiently change the y-intercept of the regression line to minimize the residual sum of squares.
We did this by calibrating the size and direction of our change of one regression line parameter -- let's say
Below is a curve that shows the RSS of a regression line with different values of
With our gradient descent algorithm, the larger the absolute value of the slope, the larger the change in our regression line parameter -- that is, the larger our step size. We take a much larger step when our slope is -146.17 at
Here is what we know so far:
- How to apply gradient descent by using the slope of the cost curve to determine the direction and magnitude of the next step for updating the parameter of a regression line
Here is what we need to learn:
- How to find that slope or rate of change of a function at a given point.
The instantaneous rate of change at a given point is called the derivative.
Derivatives are important because they tell us how a function is changing at any given point. Derivatives allow us to see what is coming next.
All a derivative means is the instantaneous rate of change of a function. We already learned how to calculate the derivative of a straight line: it's the rise over the run. The rate of change of a line is constant for all points along the line, therefore the derivative is the same for all of its points. We'll focus on calculating the derivatives of straight line functions, or linear functions, before moving onto calculating the derivative of curved lines (like our cost curve) in a future lesson.
Let's say that we want a function that represents a person taking a jog. We'll represent this by drawing a straight line.
The graph above helps us see how distance changes in relation to time, or in other words speed. So here, when we ask about rate of change, we're asking how fast is our jogger traveling?
To calculate the miles per hour we can see where a person is at a given time, then wait an hour and to see how far he traveled. Or we can wait two hours and divide distance traveled by two. Generally, our technique is to divide number of miles traveled by the number of hours passed. In this specific example we'll imagine doing the following to calculate the speed at hour 1.
- Start a stop watch after one hour and note the distance at that hour
- Then, let one hour elapse and mark down the distance at that next hour.
- Finally, divide the difference in the distances by the elapsed time.
In the below graph, we begin to calculate the speed at hour number one.
We calculate our jogger's speed by seeing were he starts at hour one and ends at hour two. Our jogger went from mile numbers three to six -- indicated by the orange line, so miles per hour is:
Miles per hour is just one example of rate of change. Anytime we come across the word per, we know this is a form of rate of change. All forms of rate of change are calculated the same way: the change in y divided by the change in x.
- Another way of expressing change in y is:
-
$y_2 - y_1$ or$\Delta y$ , read delta y
-
- Likewise, another way of expressing change in x is:
-
$x_2 - x_1$ or$\Delta x$ , read delta x
-
Generally, we can say that:
- rate of change
$= \frac{rise}{run} = \frac{\Delta y}{\Delta x} = \frac{y_2 - y_1}{x_2 - x_1}$
Just like in our example, we saw:
- miles per hour =
$\frac{distance_2 - distance_1}{time_2 - time_1} = \frac{6 - 3}{2 - 1} = \frac{3}{1} = 3$ mph
Derivatives are a specific kind of rate of change -- the rate of change of a function at a given point. For a linear function as we work with here, we calculate them through rise over run, or the change in y divided by the change in x, expressed
Stick with us, fully understanding these will pay off when we take the derivative of more complex functions.
Since our jogger is running at a constant rate, our calculated rate of change of 3 miles per hour is also the derivative. Of course, we know that in math we express our functions as
If we are given a function
We already can express the derivative of a linear function
- $ f'(x) = \frac{rise}{run} = \frac{\Delta y}{\Delta x} = \frac{y_2 - y_1}{x_2 - x_1} = \frac{f(x_2) - f(x_1)}{x_2 - x_1}$
Take a look to at the equation far to the right:
You see that we replaced
We indicate that we are calculating the derivative of
-
$x_1 = 1$ as hour 1 is our starting point -
$x_2 = 2$ as hour 2 is ending point starting point
giving us:
So
-
$f(1)$ - means the output at
$x = 1$ , or in our example, the distance at hour one, and
- means the output at
-
$f'(1)$ - means the rate of change at
$x = 1$ , or in our example, the speed at hour one
- means the rate of change at
Because the jogger's speed never changes throughout and since the derivative is the rate of change at a given point, we can conclude that the derivative also never changes. Let's plot the distance from hours zero through five on the left and the speed from hours zero through five on the right to visualize this steady pace.
- To the left is a graph of
$f(x) = 3x$ for different values of x.- And to the right is a plot of the rate of change of that function,
$f'(x)$ , for different values$x$ .- So while the distance changes through time, the speed, or rate of change, stays the same.
Now our above formulas for calculating the derivative do the job, but they don't exactly express our technique in the example of our jogger. Remember that our technique for calculating the jogger's speed is the following:
- Start a stop watch after one hour and see the distance at that hour
- Then, let time elapse one hour and see the distance at that hour.
- Finally, divide the difference in the distances by the elapsed time.
This is what this looks like in terms of math:
Let's take a second to fully understand this new formula because it's not going away.
-
$f'(x)$ is the rate of change at a given value, or here the speed at a given time -
$f(x)$ is the distance at a given time, and$f(x_1)$ is the distance at the starting time,$x_1$ -
The elapsed time is
$\Delta x$ , the change in x. -
$f(x_1 + \Delta x)$ is the distance at the starting time plus the elapsed time
This is the definition that we will often see. It expresses our technique for calculating the derivative.
- Subtract the output at one input, x, from the output at that initial input plus a change in x.
- Then divide that difference by the change in x.
In summation, that is the derivative of a line, or the rate of change of a linear function. The rate of change answers how much is our output changing at a given point.
In this lesson, we saw that the derivative is the change in output per a change in input. In the case of our jogger, the input was time and the output was distance traveled. We learned that the derivative is the change in the runner's distance traveled divided by the amount of time passed.
Graphically, we see that the derivative is simply the rise over run or change in x divided by change in y or:
Then we saw that we can express the derivative in terms of
And finally we saw how we can express the derivative in terms of