1. Explanation :
    Correct, the "cache" records values from the forward propagation units and sends it to the backward propagation units because it is needed to compute the chain rule derivatives.











  1. Explanation :
    Forward propagation propagates the input through the layers, although for shallow networks we may just write all the lines  in a deeper network, we cannot avoid a for loop iterating over the layers.







  1. Explanation :
    Yes. As seen in lecture, the number of layers is counted as the number of hidden layers + 1. The input and output layers are not counted as hidden layers.





  1. Explanation :
    Yes, as you've seen in the week 3 each activation has a different derivative. Thus, during backpropagation you need to know which activation was used in the forward propagation to be able to compute the correct derivative.





  1. Explanation :
    W[l] has shape (n[l], n[l-1])





  1. Explanation :
    b[l] has shape (n[l], 1)





  1. Explanation :
    b[l] has shape (n[l], 1)





  1. Explanation :
    b[l] has shape (n[l], 1)





  1. Explanation :
    W[l] has shape (n[l], n[l-1])





  1. Explanation :
    W[l] has shape (n[l], n[l-1])