(page 359), the LSTM was character-based, and there were about 3.5 million parameters. (a) How many parameters...

Question:

(page 359), the LSTM was character-based, and there were about 3.5 million parameters.

(a) How many parameters would there be in an LSTM if it was word-based with a vocabulary of 1000 words and a hidden state of size 1000?

(b) How many parameters would there be if the vocabulary had 10,000 words and the hidden state was of size 10,000?

(c) Consider a simple character-based transformer with a single attention mechanism that performs self-attention to predict the next character in a text.

Suppose the window size is 100, an embedding size is 1000, and there are 64 characters. Suppose as part of the transformer there are dense functions for q, k, and v as inputs to the attention mechanism, and the output of the attention goes directly into a softmax. How many parameters are there?

(d) Suppose instead of the character-based transformer in (c), the transformer was word-based, with a vocabulary of 10,000 words. How many parameters are there?

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question
Question Posted: