True
False
Test Accuracy 97% ; Runtime 1 sec ; Memory size 3MB
Test Accuracy 98% ; Runtime 9 sec ; Memory size 9MB
View Answer
Explanation : Correct! As soon as the runtime is less than 10 seconds you're good. So, you may simply maximize the test accuracy after you made sure the runtime is <10sec.
Accuracy is a satisficing metric; running time and memory size are an optimizing metric.
Accuracy, running time and memory size are all optimizing metrics because you want to do well on all three.
Accuracy is an optimizing metric; running time and memory size are a satisficing metrics.
Accuracy, running time and memory size are all satisficing metrics because you have to do sufficiently well on all three for your system to be acceptable.
Train - 3,333,334; Dev - 3,333,333 ; Test - 3,333,333
Train - 9,500,000; Dev - 250,000 ; Test - 250,000
True
False
View Answer
Explanation : Adding this data to the training set will change the training set distribution. However, it is not a problem to have different training and dev distribution. On the contrary, it would be very problematic to have different dev and test set distributions.
The 1,000,000 citizens’ data images do not have a consistent x-->y mapping as the rest of the data (similar to the New York City/Detroit housing prices example from lecture).
The test set no longer reflects the distribution of data (security cameras) you most care about.
A bigger test set will slow down the speed of iterating because of the computational expense of evaluating models on the test set.
This would cause the dev and test set distributions to become different. This is a bad idea because you’re not aiming where you want to hit.
Yes, because having 4.0% training error shows you have high bias.
Yes, because this shows your bias is higher than your variance.
No, because this shows your variance is higher than your bias.
No, because there is insufficient information to tell.
0.0% (because it is impossible to do better than this)
0.3% (accuracy of expert #1)
0.4% (average of 0.3 and 0.5)
0.75% (average of all four numbers above)
A learning algorithm’s performance can be better than human-level performance but it can never be better than Bayes error.
A learning algorithm’s performance can never be better than human-level performance but it can be better than Bayes error.
A learning algorithm’s performance can never be better than human-level performance nor better than Bayes error.
A learning algorithm’s performance can be better than human-level performance and better than Bayes error.
Try decreasing regularization
Get a bigger training set to reduce variance.
Train a bigger model to try to do better on the training set.
Try increasing regularization.
You should try to get a bigger dev set.
You have underfit to the dev set.
You should get a bigger test set.
You have overfit to the dev set.
This is a statistical anomaly (or must be the result of statistical noise) since it should not be possible to surpass human-level performance.
If the test set is big enough for the 0.05% error estimate to be accurate, this implies Bayes error is ≤ 0.05
With only 0.09% further progress to make, you should quickly be able to close the remaining gap to 0%
It is now harder to measure avoidable bias, thus progress will be slower going forward.
Needing two weeks to train will limit the speed at which you can iterate.
If 100,000,000 examples is enough to build a good enough Cat detector, you might be better of training with just 10,000,000 examples to gain a ≈10x improvement in how quickly you can run experiments, even if each model performs a bit worse because it’s trained on less data.
Buying faster computers could speed up your teams’ iteration speed and thus your team’s productivity.
Having built a good Bird detector, you should be able to take the same model and hyperparameters and just apply it to the Cat dataset, so there is no need to iterate.
Use the data you have to define a new evaluation metric (using a new dev/test set) taking into account the new species, and use that to drive further progress for your team.
Put the 1,000 images into the training set so as to try to do better on these birds.
Rethink the appropriate metric for this task, and ask your team to tune to the new metric.
Look at all the models you’ve developed during the development process and find the one with the lowest false negative error rate.
Ask your team to take into account both accuracy and false negative rate during development.
Pick false negative rate as the new metric, and use this new metric to drive all further development.