Answered step by step
Verified Expert Solution
Question
1 Approved Answer
The question is about how many trainable weights are needed for various components of a transformer. ( a ) If there are 5 0 ,
The question is about how many trainable weights are needed for various components of a transformer.
a If there are possible tokens, and the input embedding for eachtoken is of dimension how many trainable weights are needed for the input embedding vectors? You need not account for positional embeddings, which can be done by many different methods
b How many trainable weights are needed in a layernormalisation for activations? Briefly explain.
c Suppose that each transformer head has an input vector of dimension and produces an output vector of size Suppose that the Q query vectors have dimension of How many trainable weights are needed in computing the Q K and output V vectors from the input vector?
d How many trainable weights are needed for the softmax calculationthat computes the attention weights for the th input token in a transformer, with size as in the previous part c
e In a transformer, why does the forward pass during training use morememory than a forward pass during inference? What is the extra memory used for? Explain.
f How much extra memory is required during the forward pass for thetransformer head described in part c when it is applied to the th token in the input sequence? You need to include the results of the attention computation for the th token with the input tokens
before it
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started