Suppose your training examples are sentences (sequences of words). Which of the following refers to the j^th word in the i^th training example?

x^{(i)< j >}

x^{(j)< i >}

Explanation :
We index into the i^th row first to get the i^th training example (represented by parentheses), then the j^th column to get the j^th word (represented by the brackets).

Consider this RNN: RNN-example
This specific type of architecture is appropriate when:

T_x=T_y

T_x < T_y

Explanation :
It is appropriate when every input should be matched to an output.

To which of these tasks would you apply a many-to-one RNN architecture?. RNN-example

Speech recognition (input an audio clip and output a transcript)

Sentiment classification (input a piece of text and output a 0/1 to denote positive or negative sentiment)

To which of these tasks would you apply a many-to-one RNN architecture?. RNN-example

Image classification (input an image and output a label)

Gender recognition from speech (input an audio clip and output a label indicating the speaker’s gender)

You are training this RNN language model. RNN-example
At the t^th time step, what is the RNN doing? Choose the best answer

P(y^{< 1 >}, y^{< 2 >}, ..., y^{< t-1 >})

P(y^{< t >} | y^{< 1 >}, y^{< 2 >}, ..., y^{< t-1 >})

Explanation :
Yes, in a language model we try to predict the next step based on the knowledge of all prior steps.

You have finished training a language model RNN and are using it to sample random sentences, as follows: RNN-example
What are you doing at each time step t?

(i) Use the probabilities output by the RNN to pick the highest probability word for that time-step as y^. (ii) Then pass the ground-truth word from the training set to the next time-step.

(i) Use the probabilities output by the RNN to randomly sample a chosen word for that time-step as y^. (ii) Then pass this selected word to the next time-step.

You are training an RNN, and find that your weights and activations are all taking on the value of NaN (“Not a Number”). Which of these is the most likely cause of this problem?

Vanishing gradient problem.

Exploding gradient problem.

Suppose you are training a LSTM. You have a 10000 word vocabulary, and are using an LSTM with 100-dimensional activations a^{< t >} . What is the dimension of Γu at each time step?

100

Explanation :
Correct, Γu is a vector of dimension equal to the number of hidden units in the LSTM.

UPSCFEVER - POPULAR PAGES