**
Below is a network** of a few layers. Let's pick one layer.

**
And look into the computations** focusing on just that layer for now. So for layer L, you have some parameters \( w^{[l]} \) and \( b^{[l]} \) and for the forward prop, you will input the activations \( a^{[l-1]} \) from your previous layer and output \( a^{[l]} \).

**
We compute** \( z^{[l]} = w^{[l]} * a^{[l-1]} + b^{[l]} \) . And then \( a^{[l]} = g(z^{[l]}) \) .

**
So, you go from the input** \( a^{[l-1]} \) to the output \( a^{[l]} \).

**
For later use** it'll be useful to also cache the value \( z^{[l]} \) for the back propagation step.

**
And then for the backward step** or for the back propagation step, again, focusing on computation for this layer l, you're going to implement a function that inputs \( da^{[l]} \) and outputs \( da^{[l-1]} \)

**
So just to summarize**, in layer l, you're going to have the forward step or the forward prop of the forward function. Input \( a^{[l-1]} \) and output \( a^{[l]} \) and in order to make this computation you need to use \( w^{[l]} \) and \( b^{[l]} \).

**
Then the backward function**, using the back prop step, will be another function that now inputs \( da^{[l]} \) and outputs \( da^{[l-1]} \).

**
If you can implement** these two functions then the basic computation of the neural network will be as follows.

**
You're going to take the input features** \( a^{[0]} \), feed that in, and that would compute the activations of the first layer \( a^{[1]} \) and to do that we need a \( w^{[1]} \) and \( b^{[1]} \) and then will also cache \( z^{[1]} \).

**
Continuing till the last layer**, you end up outputting \( a^{[l]} \) which is equal to \( \hat{y} \). During this process, we cached all of these values z.

**
Now, for the back propagation step**, we are going backwards and computing gradients like so.

**
We feed in** \( da^{[l]} \) and then this box will give us \( da^{[l-1]} \) and so on until we get \( da^{[1]} \).

**
Along the way**, backward prop also ends up outputting \( dw^{[l]}, db^{[l]}, dz^{[l]} \).

**
So one iteration of training** through a neural network involves many computations as shown below in the figure. We store in the cache the values \( w^{[l]}, b^{[l]}, z^{[l]} \). As they are useful in the backward propagation step.

**
In the next section**, we'll talk about how you can actually implement these building blocks.