(page 359), the LSTM was character-based, and there were about 3.5 million parameters. (a) How many parameters...
Question:
(page 359), the LSTM was character-based, and there were about 3.5 million parameters.
(a) How many parameters would there be in an LSTM if it was word-based with a vocabulary of 1000 words and a hidden state of size 1000?
(b) How many parameters would there be if the vocabulary had 10,000 words and the hidden state was of size 10,000?
(c) Consider a simple character-based transformer with a single attention mechanism that performs self-attention to predict the next character in a text.
Suppose the window size is 100, an embedding size is 1000, and there are 64 characters. Suppose as part of the transformer there are dense functions for q, k, and v as inputs to the attention mechanism, and the output of the attention goes directly into a softmax. How many parameters are there?
(d) Suppose instead of the character-based transformer in (c), the transformer was word-based, with a vocabulary of 10,000 words. How many parameters are there?
Step by Step Answer:
Artificial Intelligence: Foundations Of Computational Agents
ISBN: 9781009258197
3rd Edition
Authors: David L. Poole , Alan K. Mackworth