Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The question is about how many trainable weights are needed for various components of a transformer. ( a ) If there are 5 0 ,

The question is about how many trainable weights are needed for various components of a transformer.
(a) If there are 50,000 possible tokens, and the input embedding for eachtoken is of dimension 784, how many trainable weights are needed for the input embedding vectors? (You need not account for positional embeddings, which can be done by many different methods)
(b) How many trainable weights are needed in a layer-normalisation for2000 activations? Briefly explain.
(c) Suppose that each transformer head has an input vector of dimension1000, and produces an output vector of size 200. Suppose that the Q (query) vectors have dimension of 700. How many trainable weights are needed in computing the Q, K, and output V vectors from the input vector?
(d) How many trainable weights are needed for the softmax calculationthat computes the attention weights for the 20th input token in a transformer, with size as in the previous part 5c
(e) In a transformer, why does the forward pass during training use morememory than a forward pass during inference? What is the extra memory used for? Explain.
(f) How much extra memory is required during the forward pass for thetransformer head described in part 5c, when it is applied to the 20th token in the input sequence? You need to include the results of the attention computation for the 20th token with the 19 input tokens
before it.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions

Question

Give some examples where fuzzy logic is used?

Answered: 1 week ago