If you have 10,000,000 examples, how would you split the train/dev/test set?

98% train . 1% dev . 1% test

33% train . 33% dev . 33% test

The dev and test set should:

Come from the same distribution

Come from different distributions

If your Neural Network model seems to have high variance, what of the following would be promising things to try?

Add regularization

Increase the number of units in each hidden layer

If your Neural Network model seems to have high variance, what of the following would be promising things to try?

Get more training data

Increase the number of units in each hidden layer

You are working on an automated check-out kiosk for a supermarket, and are building a classifier for apples, bananas and oranges. Suppose your classifier obtains a training set error of 0.5%, and a dev set error of 7%. Which of the following are promising things to try to improve your classifier?

Increase the regularization parameter lambda

Use a bigger neural network

Decrease the regularization parameter lambda

Get more training data

What is weight decay?

The process of gradually decreasing the learning rate during training.

A technique to avoid vanishing gradient by imposing a ceiling on the values of the weights.

A regularization technique (such as L2 regularization) that results in gradient descent shrinking the weights on every iteration.

Gradual corruption of the weights in the neural network if it is trained on noisy data.

What happens when you increase the regularization hyperparameter lambda?

Weights are pushed toward becoming smaller (closer to 0)

Weights are pushed toward becoming bigger (further from 0)

With the inverted dropout technique, at test time:

You do not apply dropout (do not randomly eliminate units) and do not keep the 1/keep_prob factor in the calculations used in training

You apply dropout (randomly eliminating units) but keep the 1/keep_prob factor in the calculations used in training.

Increasing the parameter keep_prob from (say) 0.5 to 0.6 will likely cause the following:

Increasing the regularization effect

Reducing the regularization effect

Which of these techniques are useful for reducing variance (reducing overfitting)?

L2 regularization

Gradient Checking

Which of these techniques are useful for reducing variance (reducing overfitting)?

Data augmentation

Gradient Checking

Which of these techniques are useful for reducing variance (reducing overfitting)?

Dropout

Gradient Checking

Why do we normalize the inputs x?

It makes the cost function faster to optimize

It makes the parameter initialization faster

It makes it easier to visualize the data

Normalization is another word for regularization--It helps to reduce variance

UPSCFEVER - POPULAR PAGES