Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Aug 19, 2024

Question 2 : Parameter - Efficient Transfer Learning [ Jiaoda ] ( 3 0 pts ) Consider a vanilla encoder - decoder transformer [ 2

Question

2

: Parameter

-

Efficient Transfer Learning

[

Jiaoda

] (30

pts

)

Consider a vanilla encoder

-

decoder transformer

[2] .

) (1

pts

)

Given the vocabulary size

V

and embedding dimension

D,

compute the number of

parameters in an embedding layer

(

ignore positional encodings

) .

) (2

pts

)

How many embedding layers are there in an encoder

-

decoder transformer architecture?

What is the total number of parameters in the embedding layers? Is it larger than your answer in

) ?

Why, or

,

why not?

In an encoder layer, there are two sub

-

layers: a multi

-

head self

-

attention mechanism and a position

-

wise

fully connected feed

-

forward network. A residual connection is deployed around each sub

-

layer, followed

by layer normalization.

) (2

pts

)

Compute the number of parameters in a multi

-

head self

-

attention sub

-

layer. Write down

all the intermediate steps and assumptions you make.

) (2

pts

)

Given that the dimensionality of the intermediate layer is

4 D,

compute the number of

parameters in a feed

-

forward network.

) (1

pts

)

In a decoder layer, there is an additional sub

-

layer: multi

-

head encoder

-

decoder attention.

Compute the number of parameters in one such sub

-

layer.

) (2

pts

)

There is an output layer made up of a linear transformation and a softmax function that

produces next

-

token probabilities. Does it introduce extra parameters? Why or why not?

) (2

pts

)

Given that both the encoder and the decoder have

L

layers, compute the total number of

parameters in the transformer.

Consider the adapter network described in

[1] .

) (2

pts

)

Given the bottleneck dimension of the adapter

M,

compute the number of parameters in

a single adapter module.

) (2

pts

)

If we insert an adapter after each sub

-

layer, how many adapters are inserted in an encoder

-

decoder transformer described above? Compute the total number of newly added parameters.

) (2

pts

)

If we perform adapter tuning on a downstream binary classification task, what components

are trained? Compute the total number of trainable parameters.

) (2

pts

)

Under what condition is adapter tuning more parameter

-

efficient than fine

-

tuning?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Expert Oracle9i Database Administration

Authors: Sam R. Alapati

1st Edition

1590590228, 978-1590590225

More Books

Students also viewed these Databases questions

Question

★★★★★

Allocating a service center cost to operating departments The administrative department of Andrews Consulting, LLC, provides office administration and professional support to its two operating...

Answered: 1 week ago

Question

★★★★★

Suppose you see a 2013 Volkswagen Jetta GLS Turbo Sedan advertised in the campus newspaper for $9,000. If you knew the car was reliable, you would be willing to pay $10,000 for it. If you knew the...

Answered: 1 week ago

Question

★★★★★

List the requirements for an internetworking facility.

Answered: 1 week ago

Question

★★★★★

Brown Company (buyer) and Schmidt, Inc. (seller) engaged in the following transactions during February 2016: Brown Company DATE TRANSACTIONS 2016 Feb. 10 Purchased merchandise for $3,000 from...

Answered: 1 week ago

Question

★★★★★

Exercise 9-36 Activity-Based Costing in a Nonmanufacturing Environment (LO 9-4, 5 Cathy, the manager of Cathy's Catering, Inc., uses activity-based costing to compute the costs of her catered pa of...

Answered: 1 week ago

Question

★★★★★

Keth Committee O O VA 19 O - 333 e Keth Committee O O VA 19 O - 333 e

Answered: 1 week ago

Question

★★★★★

During FY 2018, Adelphi Company reported sales of $400,000, a contribution margin of $5.00 per unit, fixed costs of $90,000, and net income of $20,000. Use this information to determine the number of...

Answered: 1 week ago

Question

★★★★★

[Combinatorics, Conditional Probability and Beyond] Consider a game with n = N participants which are somehow ordered (P1, P2, ..., Pn). The game starts with the first numbered player (P1) tosses a...

Answered: 1 week ago

Question

★★★★★

Corruption has been a significant problem in Iraq. Opening and running a business in Iraq usually requires paying multiple bribes to government officials. We can think of there being a demand and...

Answered: 1 week ago

Question

★★★★★

6) However long the assembling step continues as typical, the throughput will just twofold in the event that they add extra hardware to deal with the additional heap of material. Past apparatus...

Answered: 1 week ago

Question

★★★★★

hello, read the case and answer the following question from the case. pleaee I need help to get answers not short answers please. Not too short not too long Facts of the Case: Supercleen Limited, a...

Answered: 1 week ago

Question

★★★★★

What does this key public know about this issue?

Answered: 1 week ago

Question

★★★★★

What is the nature and type of each key public?

Answered: 1 week ago

Question

★★★★★

What does this public need on this issue?

Answered: 1 week ago

Previous Question Next Question