Deep L-layer neural network - UPSCFEVER

Below is a network of a few layers. Let's pick one layer.

And look into the computations focusing on just that layer for now. So for layer L, you have some parameters \( w^{[l]} \) and \( b^{[l]} \) and for the forward prop, you will input the activations \( a^{[l-1]} \) from your previous layer and output \( a^{[l]} \).

We compute \( z^{[l]} = w^{[l]} * a^{[l-1]} + b^{[l]} \) . And then \( a^{[l]} = g(z^{[l]}) \) .

So, you go from the input \( a^{[l-1]} \) to the output \( a^{[l]} \).

For later use it'll be useful to also cache the value \( z^{[l]} \) for the back propagation step.

And then for the backward step or for the back propagation step, again, focusing on computation for this layer l, you're going to implement a function that inputs \( da^{[l]} \) and outputs \( da^{[l-1]} \)

So just to summarize, in layer l, you're going to have the forward step or the forward prop of the forward function. Input \( a^{[l-1]} \) and output \( a^{[l]} \) and in order to make this computation you need to use \( w^{[l]} \) and \( b^{[l]} \).

Then the backward function, using the back prop step, will be another function that now inputs \( da^{[l]} \) and outputs \( da^{[l-1]} \).

If you can implement these two functions then the basic computation of the neural network will be as follows.

You're going to take the input features \( a^{[0]} \), feed that in, and that would compute the activations of the first layer \( a^{[1]} \) and to do that we need a \( w^{[1]} \) and \( b^{[1]} \) and then will also cache \( z^{[1]} \).

Continuing till the last layer, you end up outputting \( a^{[l]} \) which is equal to \( \hat{y} \). During this process, we cached all of these values z.

Now, for the back propagation step, we are going backwards and computing gradients like so.

We feed in \( da^{[l]} \) and then this box will give us \( da^{[l-1]} \) and so on until we get \( da^{[1]} \).

Along the way, backward prop also ends up outputting \( dw^{[l]}, db^{[l]}, dz^{[l]} \).

So one iteration of training through a neural network involves many computations as shown below in the figure. We store in the cache the values \( w^{[l]}, b^{[l]}, z^{[l]} \). As they are useful in the backward propagation step.

In the next section, we'll talk about how you can actually implement these building blocks.