### Introductory Video

• The Machine Learning course includes several programming assignments which you’ll need to finish to complete the course. The assignments require the Octave scientific computing language.

• Octave is a free, open-source application available for many platforms. It has a text interface and an experimental graphical one. Octave is distributed under the GNU Public License, which means that it is always free to download and distribute.

• Use to install Octave for windows. "Warning: Do not install Octave 4.0.0";

• Installing Octave on GNU/Linux : On Ubuntu, you can use: sudo apt-get update && sudo apt-get install octave. On Fedora, you can use: sudo yum install octave-forge

• Files included in this exercise can be downloaded here ⇒ :

1. ex2_reg.m - Octave/MATLAB script for the later parts of the exercise

2. ex2data1.txt - Training set for the first half of the exercise

3. ex2data2.txt - Training set for the second half of the exercise

4. mapFeature.m- Function to generate polynomial features

5. plotDecisionBoundary.m - Function to plot classifier’s decision boundary

• Files you will need to modify as part of this assignment :

1. costFunctionReg.m - Regularized Logistic Regression Cost

• In this part of the exercise, you will implement regularized logistic regression to predict whether microchips from a fabrication plant passes quality assurance (QA). During QA, each microchip goes through various tests to ensure it is functioning correctly.

• Suppose you are the product manager of the factory and you have the test results for some microchips on two different tests.

• From these two tests, you would like to determine whether the microchips should be accepted or rejected.

• To help you make the decision, you have a dataset of test results on past microchips, from which you can build a logistic regression model.

• You will use another script, ex2_reg.m to complete this portion of the exercise.

• Similar to the previous parts of this exercise, plotData is used to generate a figure like Figure 3, where the axes are the two test scores, and the positive (y = 1, accepted) and negative (y = 0, rejected) examples are shown with different markers.

• Figure 3 shows that our dataset cannot be separated into positive and negative examples by a straight-line through the plot.

• Therefore, a straightforward application of logistic regression will not perform well on this dataset since logistic regression will only be able to find a linear decision boundary.

• One way to fit the data better is to create more features from each data point. In the provided function mapFeature.m, we will map the features into all polynomial terms of x1 and x2 up to the sixth power.

• As a result of this mapping, our vector of two features (the scores on two QA tests) has been transformed into a 28-dimensional vector.

• A logistic regression classifier trained on this higher-dimension feature vector will have a more complex decision boundary and will appear nonlinear when drawn in our 2-dimensional plot.

• While the feature mapping allows us to build a more expressive classifier, it also more susceptible to overfitting.

• In the next parts of the exercise, you will implement regularized logistic regression to fit the data and also see for yourself how regularization can help combat the overfitting problem.

• Now you will implement code to compute the cost function and gradient for regularized logistic regression.

• Complete the code in costFunctionReg.m to return the cost and gradient.

• Recall that the regularized cost function in logistic regression is

• $$J(\theta) = - \frac{1}{m} \sum_{i=1}^m \large[ y^{(i)}\ \log (h_\theta (x^{(i)})) + (1 - y^{(i)})\ \log (1 - h_\theta(x^{(i)}))\large] + \frac{\lambda}{2m}\sum_{j=1}^n \theta_j^2$$

• Note that you should not regularize the parameter θ0. In Octave/MATLAB, recall that indexing starts from 1, hence, you should not be regularizing the theta(1) parameter (which corresponds to θ0) in the code.

• The gradient of the cost function is a vector where the j th element is defined as follows:

• Once you are done, ex2_reg.m will call your costFunctionReg function using the initial value of θ (initialized to all zeros).

• You should see that the cost is about 0.693.

• The hypothesis is a vector, formed from the sigmoid() of the products of X and $\theta$.

• Be sure your sigmoid() function passes the submit grader before going any further.

• First focus on the circled portions of the cost equation. Each of these is a vector of size (m x 1). In the steps below we'll distribute the summation operation, as shown in purple, so we end up with two scalars (for the 'red' and 'blue' calculations).

• The red-circled term is the sum of -y multiplied by the natural log of h. Note that the natural log function is log(). Don't use log10(). Since we want the sum of the products, we can use a vector multiplication. The size of each argument is (m x 1), and we want the vector product to be a scalar, so use a transposition so that (1 x m) times (m x 1) gives a result of (1 x 1), a scalar.

• The blue-circled term uses the same method, except that the two vectors are (1 - y) and the natural log of (1 - h).

• Subtract the right-side term from the left-side term Scale the result by 1/m. This is the unregularized cost.

• Now we have only the regularization term remaining. We want the regularization to exclude the bias feature, so we can set theta(1) to zero. Since we already calculated h, and theta is a local variable, we can modify theta(1) without causing any problems.

• Now we need to calculate the sum of the squares of theta. Since we've set theta(1) to zero, we can square the entire theta vector. If we vector-multiply theta by itself, we will calculate the sum automatically. So use the same method we used in Steps 3 and 4 to multiply theta by itself with a transposition.

• Now scale the cost regularization term by (lambda / (2 * m)). Be sure you use enough sets of parenthesis to get the correct result. Special Note for those whose cost value is too high: 1/(2*m) and (1/2*m) give drastically different results.

• X = [ones(3,1) magic(3)];
y = [1 0 1]';
theta = [-2 -1 1 2]';

[j g] = costFunctionReg(theta, X, y, 0)

% regularized
[j g] = costFunctionReg(theta, X, y, 4)
% note: also works for ex3 lrCostFunction(theta, X, y, 4)

% results
j =  8.6832
g =

0.31722
-0.46102
2.98146
4.90454


• Similar to the previous parts, you will use fminunc to learn the optimal parameters θ.

• If you have completed the cost and gradient for regularized logistic regression (costFunctionReg.m) correctly, you should be able to step through the next part of ex2_reg.m to learn the parameters θ using fminunc

• To help you visualize the model learned by this classifier, we have provided the function plotDecisionBoundary.m which plots the (non-linear) decision boundary that separates the positive and negative examples.

• In plotDecisionBoundary.m, we plot the non-linear decision boundary by computing the classifier’s predictions on an evenly spaced grid and then and drew a contour plot of where the predictions change from y = 0 to y = 1.

• After learning the parameters θ, the next step in ex_reg.m will plot a decision boundary similar to Figure 4.