1. Explanation :

    We index into the ith row first to get the ith training example (represented by parentheses), then the jth column to get the jth word (represented by the brackets).






  1. Explanation :

    It is appropriate when every input should be matched to an output.










  1. Explanation :

    Yes, in a language model we try to predict the next step based on the knowledge of all prior steps.










  1. Explanation :

    Correct, Γu is a vector of dimension equal to the number of hidden units in the LSTM.