Getting your matrix dimensions right - UPSCFEVER

When implementing a deep neural network, one of the debugging tools often used to check the correctness of the code is work through the dimensions and matrix involved.

deep-networks-dimensions

In above figure, Capital L is equal to 5, i.e. not counting the input layer, there are five layers here, so four hidden layers and one output layer.

And so if you implement forward propagation, the steps will be: \( Z^{[1]} = W^{[1]}*A^{[0]} + B^{[1]} \\ A^{[1]} = g(Z^{[1]}) \\ Z^{[2]} = W^{[2]}*A^{[1]} + B^{[2]} \\ A^{[2]} = g(Z^{[2]}) \\ Z^{[3]} = W^{[3]}*A^{[2]} + B^{[3]} \\ A^{[3]} = g(Z^{[3]}) \\ Z^{[4]} = W^{[4]}*A^{[3]} + B^{[4]} \\ A^{[4]} = g(Z^{[4]}) Z^{[5]} = W^{[5]}*A^{[4]} + B^{[5]} \\ \hat{Y} = A^{[5]} = g(Z^{[5]}) \\ \)

Now this first hidden layer has three hidden units. So using the notation we had from the previously, we have that \( n^{[1]} \), which is the number of hidden units in layer 1, is equal to 3.

And here we would have the \( n^{[2]} \) is equal to 5, \( n^{[3]} \) is equal to 4, \( n^{[4]} \) is equal to 2, and \( n^{[5]} \) is equal to 1.

And finally, for the input layer, we also have \( n^{[0]} \) = n_x = 2.

So now, let's think about the dimensions of z, w, and x. z is the vector of activations for this first hidden layer, so z is going to be 3 by 1, it's going to be a 3-dimensional vector.

So I'm going to write it a \( n^{[1]} \) by 1-dimensional vector, \( n^{[1]} \) by 1-dimensional matrix, all right, so 3 by 1 in this case.

Now about the input features x, we have two input features.

So x is in this example 2 by 1, but more generally, it would be \( n^{[0]} \) by 1.

So what we need is for the matrix \( w^{[1]} \) to be something that when we multiply an \( n^{[0]} \) by 1 vector to it, we get an \( n^{[1]} \) by 1 vector

So you have a three dimensional vector equals something times a two dimensional vector.

And so by the rules of matrix multiplication, this has got be a 3 by 2 matrix because a 3 by 2 matrix times a 2 by 1 matrix, or times the 2 by 1 vector, that gives you a 3 by 1 vector.

And more generally, this is going to be an \( n^{[1]} \) by \( n^{[0]} \) dimensional matrix.

So what we figured out here is that the dimensions of \( w^{[1]} \) has to be \( n^{[1]} \) by \( n^{[0]} \).

And more generally, the dimensions of \( w^{[L]} \) must be \( n^{[L]} \) by \( n^{[L]} \) minus 1. So for example, the dimensions of \( w^{[2]} \) , for this, it would have to be 5 by 3, or it would be \( n^{[2]} \) by \( n^{[1]} \). Because we're going to compute \( z^{[2]} \) as \( w^{[2]} \) times \( a^{[1]} \) (let's ignore the bias for now).

So the general formula to check is that when you're implementing the matrix for layer L, that the dimension of that matrix be \( n^{[L]} \) by \( n^{[L-1]} \).

Let's think about the dimension of this vector b. The general rule is that \( b^{[L]} \) should be \( n^{[L]} \) by 1 dimensional.

So hopefully these two equations help you to double check that the dimensions of your matrices w, as well as your vectors p, are the correct dimensions.

And of course, if you're implementing back propagation, then the dimensions of dw should be the same as the dimension of w.

So dw should be the same dimension as w, and db should be the same dimension as b.

Now the other key set of quantities whose dimensions to check are these z, x, as well as \( a^{[L]} \) because \( z^{[L]} \) is equal to \( g(a^{[L]} \), applied element wise, then z and a should have the same dimension in these types of networks.