1. Explanation :
Correct, the "cache" records values from the forward propagation units and sends it to the backward propagation units because it is needed to compute the chain rule derivatives.

1. Explanation :
Forward propagation propagates the input through the layers, although for shallow networks we may just write all the lines  in a deeper network, we cannot avoid a for loop iterating over the layers.

1. Explanation :
Yes. As seen in lecture, the number of layers is counted as the number of hidden layers + 1. The input and output layers are not counted as hidden layers.

1. Explanation :
Yes, as you've seen in the week 3 each activation has a different derivative. Thus, during backpropagation you need to know which activation was used in the forward propagation to be able to compute the correct derivative.

1. Explanation :
W[l] has shape (n[l], n[l-1])

1. Explanation :
b[l] has shape (n[l], 1)

1. Explanation :
b[l] has shape (n[l], 1)

1. Explanation :
b[l] has shape (n[l], 1)

1. Explanation :
W[l] has shape (n[l], n[l-1])

1. Explanation :
W[l] has shape (n[l], n[l-1])