• Vectorization is basically the art of getting rid of explicit folders in your code.

• In the deep learning era safety in deep learning in practice, you often find yourself training on relatively large data sets, because that's when deep learning algorithms tend to shine.

• And so, it's important that your code very quickly because otherwise, if it's running on a big data set, your code might take a long time to run then you just find yourself waiting a very long time to get the result.

• So in the deep learning era, I think the ability to perform vectorization has become a key skill.

• In logistic regression you need to compute Z equals W transpose X plus B $$Z = W^TX + B$$, where W was this column vector and X is also this vector.

• Maybe there are very large vectors if you have a lot of features. So, W and X were both these $$w , x \in R^{n_x}$$ dimensional vectors as shown in figure below.

• So, to compute W transpose X, if you had a non-vectorized implementation, you would do something like given below:

• z = 0;
for i in range (n - x):
z + = w[i] * x[i]
z += b


• So, that's a non-vectorized implementation. Then you find that that's going to be really slow.

• In contrast, a vectorized implementation would just compute W transpose X directly.

• In Python or a numpy, the command you use for that is z = np.dot(w,x) + b, so this computes W transpose X and b is added.

• And when you are implementing deep learning algorithms, you can really get a result back faster. It will be much faster if you vectorize your code.

• Some of you might have heard that a lot of scaleable deep learning implementations are done on a GPU or a graphics processing unit. But all the demos I did just now in the Jupiter notebook where actually on the CPU.

• And it turns out that both GPU and CPU have parallelization instructions. They're sometimes called SIMD instructions. This stands for a single instruction multiple data.

• But what this basically means is that, if you use built-in functions such as this np.function or other functions that don't require you explicitly implementing a for loop.

• It enables Python to take much better advantage of parallelism to do your computations much faster.

• And this is true both computations on CPUs and computations on GPUs. It's just that GPUs are remarkably good at these SIMD calculations but CPU is actually also not too bad at that.

• Maybe just not as good as GPUs. You're seeing how vectorization can significantly speed up your code.

• The rule of thumb to remember is whenever possible, avoid using explicit four loops.

• The rule of thumb to keep in mind is, when you're programming your new networks, or when you're programming just a regression, whenever possible avoid explicit for-loops.

• And it's not always possible to never use a for-loop, but when you can use a built in function or find some other way to compute whatever you need, you'll often go faster than if you have an explicit for-loop.

• If ever you want to compute a vector u as the product of the matrix A, and another vector v, then we use the code given below:

• u = np.zeros((n,1))
for i .....
for j ....
u[i] + = A[i][j] * v[j]


• So, that's a non-vectorized version, the vectorized implementation which is to say u = np.dot(A,v).

• The vectorized version, eliminates two different for-loops, and it's going to be way faster.

• Let's say you already have a vector, v, in memory and you want to apply the exponential operation on every element of this vector v.

• So in the non-vectorized implementation, which is at first you initialize u to the vector of zeros. And then you have a for-loop that computes the elements one at a time.

• u = np.zeros((n,1))
for i in range(n):
u[i] = math.exp(v[i])


• But it turns out that Python and NumPy have many built-in functions that allow you to compute these vectors with just a single call to a single function. So what I would do to implement this is

• import numpy as np

u = np.exp(v).

• And so, notice that, whereas previously you had that explicit for-loop, with just one line of code here, just v as an input vector u as an output vector, you've gotten rid of the explicit for-loop, and the implementation will be much faster that the one needing an explicit for-loop.

• In fact, the NumPy library has many of the vector value functions.

1. np.log(v) will compute the element-wise log

2. np.abs(v) computes the absolute value

3. np.maximum computes the element-wise maximum to take the max of every element of v with 0

4. v**2 just takes the element-wise square of each element of v

5. $$\frac{1}{v}$$ takes the element-wise inverse.

• So, whenever you are tempted to write a for-loop take a look, and see if there's a way to call a NumPy built-in function to do it without that for-loop.