Multi-task learning refers to having one neural network do simultaneously several tasks.
When to use multi-task learning
Training on a set of tasks that could benefit from having shared lower-level features
Usually: Amount of data you have for each task is quite similar
Can train a big enough neural network to do well on all tasks
Example: Simplified autonomous vehicle
The vehicle has to detect simultaneously several things: pedestrians, cars, road signs, traffic lights, cyclists, etc.
We could have trained four separate neural networks, instead of train one to do four tasks.
However, in this case, the performance of the system is better when one neural network is trained to do four tasks than training four separate neural networks since some of the earlier features in the neural network could be shared between the different types of objects.
The input x^{(i)} is the image with multiple labels
The output y^{(i)} has 4 labels which are represents: \( y^{(i)} = \begin{bmatrix} 0\\ 0\\ 1\\ 0 \end{bmatrix} \)
Neural Network architecture
To train this neural network, loss function is defined as follow:
\( J(w,b) = − \frac{1}{m} \sum_{i = 1}^{m} \sum_{j=1}^{4} L(\hat{y}^{(i)}, y^{(i)}) = − \frac{1}{m} \sum_{i = 1}^{m} \sum_{j=1}^{4} [y^{(i)}log(\hat{y}^{(i)})−(1−y^{(i)})log(1−\hat{y}^{(i)}) ] \)
Also, the cost can be compute such as it is not influenced by the fact that some entries are not labeled.
Example:
\(
\begin{bmatrix}
1 & 0 & ? & ? \\
0 & 1 & ? & 0\\
0 & 1 & ? & 1\\
? & 0 & 1 & 0
\end{bmatrix}
\)