• If the basic technical idea is behind deep learning behind your networks have been around for decades why are they only just now taking off


  • Let's go over some of the main drivers behind the rise of deep learning because I think this will help you that the spot the best opportunities within your own organization to apply these to over the last few years


  • Let's say we plot a figure where on the horizontal axis we plot the amount of data we have for a task and let's say on the vertical axis we plot the performance on above learning algorithms such as the accuracy of our spam classifier


  • It turns out if you plot the performance of a traditional learning algorithm like support vector machine or logistic regression as a function of the amount of data you have you might get a curve that looks like one given below deep learning


  • Where the performance improves for a while as you add more data but after a while the performance you know pretty much plateaus


  • Over the last 20 years for a lot of applications we just accumulate a lot more data more than traditional learning algorithms were able to effectively take advantage of


  • If you want to hit a very high level of performance then you need two things first often you need to be able to train a big enough neural network in order to take advantage of the huge amount of data and second we often say that scale has been driving deep learning progress


  • By scale it is meant that both the size of the neural network needed i.e. a new network a lot of hidden units, a lot of parameters, a lot of connections as well as scale of the data


  • Today one of the most reliable ways to get better performance in the neural network is often to either train a bigger network or throw more data at it and that only works up to a point because eventually you run out of data or eventually then your network is so big that it takes too long to train but just improving scale has actually taken us a long way in the world of learning


  • So if you don't have a lot of training data is often up to your skill at hand engineering features that determines success


  • We've also seen tremendous algorithmic innovation about trying to make neural networks run much faster so as a concrete example one of the huge breakthroughs in your networks has been switching from a sigmoid function which looks like this to a RELU function


  • deep learning
  • It turns out that one of the problems of using sigmoid functions and machine learning is that there these regions here where the slope of the function is nearly zero and so learning becomes really slow because when you implement gradient descent and gradient is zero the parameters just change very slowly and so learning is very slow


  • Whereas by changing the the activation function the neural network to use RELU function or rectified linear unit, the gradient is much less likely to gradually shrink to zero


  • Utimately the impact of this algorithmic innovation was it allows the code to run much faster and allows us to train bigger neural networks


  • The process of training a neural network is iterative deep learning


  • It could take a good amount of time to train a neural network, which affects your productivity. Faster computation helps to iterate and improve new algorithm.