So far we have seen that the derivative of a function is the instantaneous rate of change of that function. In other words, how does a function's output change as we change one of the variables. In this lesson, we will learn about the chain rule, which allows us to see how a function's output changes as we change a variable that the function does not directly depend on. The chain rule may seem complicated, but it is just a matter of following a prescribed procedure. Learning about the chain rule will allow us to take the derivative of more complicated functions that we will encounter in machine learning.
Ok, now let's talk about the chain rule. Imagine that we would like to take the derivative of the following function:
Doing something like that can be pretty tricky right off the bat. Lucky for us, we can use the chain rule. The chain rule is essentially a trick that can be applied when our functions get complicated. The first step is using functional composition to break our function down. Ok, let's do it.
Let's turn these two into functions while we are at it.
def g_of_x(x):
return 0.5*x + 3
g_of_x(2) # 4
4.0
def f_of_x(x):
return (g_of_x(x))**2
f_of_x(2) # 16
16.0
Looking at both the mathematical and code representations of
def g_of_x(x):
return 0.5*x + 3
def f_of_x(x): # outer function f(x)
return (g_of_x(x))**2 #inner function g(x)
Let's plot these two functions.
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected=True)
from graph import trace_values, plot
x_values = list(range(0, 10))
f_of_x_values = list(map(lambda x: f_of_x(x),x_values))
g_of_x_values = list(map(lambda x: g_of_x(x),x_values))
f_of_x_trace = trace_values(x_values, f_of_x_values, mode = 'lines', name = 'f(x) = (g(x))^2')
g_of_x_trace = trace_values(x_values, g_of_x_values, mode = 'lines', name = 'g(x) = 0.5*x + 3')
plot([g_of_x_trace, f_of_x_trace])
Ok, so now that we have a sense of how our function
From our rules about derivatives we know that the power rule tells us that the derivative of $g(x) = 0.5x +3 $ is
Now a trickier question is what is the derivative of, our outer function
Notice that the outer function
def g_of_x(x):
return 0.5*x + 3
def f_of_x(x): # outer function f(x)
return (g_of_x(x))**2 #inner function g(x)
The chain rule: So in taking the derivative,
$\frac{df}{dx}$ of an outer function,$f(x)$ , which depends on an inner function$g(x)$ , which depends on$x$ , the derivative equals the derivative of the outer function times the derivative of the inner function.
Or:
Ok, so that is the chain rule. Let's apply this to our example.
Remember we started with the function
2. Find the derivatives,
- as we know from the calculation above
$g'(x) = 0.5$ - and
$f'g((x)) = 2*(g(x))^{1} = 2*g(x)$
3. Substitute into our chain rule
We have:
- $ f'(g(x)) = f'g(x)g'(x) = 2g(x)0.5 = 1g(x)$
Then substituting for
So the derivative of the function $f(x) = (0.5x + 3)^2 $ is
The chain rule is allows us to determine the rate of change of a function that does not directly depend on a variable,
def g_of_x(x):
return 0.5*x + 3
def f_of_x(x): # outer function f(x)
return (g_of_x(x))**2 #inner function g(x)
It does not directly depend on
Remember, taking a derivative means changing a variable
$x$ a little, and seeing the change in the output. The chain rule allows us to solve the problem of seeing the change in output when our function does not directly depend on that changing variable, but depends on **a function ** that depends on a variable.
We can take the derivative of a function that indirectly depends on
Let's go through some more examples.
Stop here, and give this a shot on your own. The answer will always be waiting for you right below, so you really have nothing to lose. No one will know if you struggle - and it's kinda the point.
1.Divide the function into two components
2. Take the derivative of each of the component functions
3. Substitution
$$f'(x) = f'(g(x))g'(x) = 3(g(x))^2(6x+10)$$
Then substituting in $g(x) = 3x^2 + 10x $ we have:
And we can leave it there for now.
In this lesson, we learned about the chain rule. The chain rule allows us to take the derivative of a function that that comprises of another function that depends on