1. Explanation :
    As discussed in sections, applied ML is a highly iterative process. If you train a basic model and carry out error analysis (see what mistakes it makes) it will help point you in more promising directions.





  1. Explanation :
    Softmax would be a good choice if one and only one of the possibilities (stop sign, speed bump, pedestrian crossing, green light and red light) was present in each image.





  1. Explanation :
    Focus on images that the algorithm got wrong. Also, 500 is enough to give you a good initial sense of the error statistics. There’s probably no need to look at 10,000, which will take a long time.





  1. Explanation :
    As seen in the sections on multi-task learning, you can compute the cost such that it is not influenced by the fact that some entries haven’t been labeled.





  1. Explanation :
    Yes. As seen in lecture, it is important that your dev and test set have the closest possible distribution to “real”-data. It is also important for the training set to contain enough “real”-data to avoid having a data-mismatch problem.









  1. Explanation :
    The algorithm does better on the distribution of data it trained on. But you don’t know if it’s because it trained on that no distribution or if it really is easier. To get a better sense, measure human-level error separately on both distributions.





  1. Explanation :
    Correct, this is the most appropriate decision in this situation.