**Sometimes people refer to SVM as large margin classifiers**We'll consider what that means and what an SVM hypothesis looks like The SVM cost function is as above, and we've drawn out the cost terms below**Left is cost1 and right is cost0**What does it take to make terms small**If y =1**cost1(z) = 0 only when z >= 1 If y = 0 cost0(z) = 0 only when z <= -1**Interesting property of SVM**If you have a positive example, you only really need z to be greater or equal to 0 If this is the case then you predict 1**SVM wants a bit more than that**- doesn't want to *just* get it right, but have the value be quite a bit bigger than zero. It Throws in an extra safety margin factor**Logistic regression does something similar**What are the consequences of this? Consider a case where we set C to be huge C = 100,000 So considering we're minimizing CA + B. So for us the A term shown in the figure below must be set to 0.**If C is huge**we're going to pick an A value so that A is equal to zero What is the optimization problem here - how do we make A = 0? Making A = 0; If y = 1 ; Then to make our "A" term 0 need to find a value of θ so (\( \theta^T.x \)) is greater than or equal to 1**Similarly, if y = 0**Then we want to make "A" = 0 then we need to find a value of θ so (\( \theta^T.x \)) is equal to or less than -1**So - if we think of our optimization problem**a way to ensure that this first "A" term is equal to 0, we re-factor our optimization problem into just minimizing the "B" (regularization) term, because When A = 0 --> A*C = 0 So we're minimizing B, under the constraints shown below**Turns out**when you solve this problem you get interesting decision boundaries**The green and magenta lines**are functional decision boundaries which could be chosen by logistic regression. But they probably don't generalize too well**The black line**, by contrast is the the chosen by the SVM because of this safety net imposed by the optimization graph. More robust separator Mathematically, that black line has a larger minimum distance (margin) from any of the training examples**By separating with the largest margin**you incorporate robustness into your decision making process. We looked at this at when C is very large. SVM is more sophisticated than the large margin might look**If you were just**using large margin then SVM would be very sensitive to outliers**You would risk**making a ridiculous hugely impact your classification boundary A single example might not represent a good reason to change an algorithm If C is very large then we do use this quite naive maximize the margin approach**So we'd change the black to the magenta**But if C is reasonably small, or a not too large, then you stick with the black decision boundary**What about non-linearly separable data?**Then SVM still does the right thing if you use a normal size C So the idea of SVM being a large margin classifier is only really relevant when you have no outliers and you can easily linearly separable data Means we ignore a few outliers