Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

1 2 . In ISTM, the two gates that are responsible to update the cell state are and _ gates. ( A input, outputB )

12.In ISTM, the two gates that are responsible to update the cell state are and _ gates.(A input, outputB) input, forget (C forget, output(DNone of these13.In LSTM, the __ gate(s) is/are used to update the short-term memory(A) input(B forget(C output(D All of these14.Which of the following are the advantage ofTransformers over Recurrent sequence models?Faster to train and run on modern hardware(B Better at learning short-range dependencies(C Require many fewer parameters to achievesimilar resultsAll of these15.The attention mechanism is a way ofAdetermining the similarity between two sentencesB identifying the topic of a sentence (c predicting the next word in a sentenceDgiving the importance of each word in a sentence compared to others16.To prevent the decoder in transformer from looking at future tokens, we add (A I look-ahead mask (B context vectors(C softmax layer(D All of theseIn transformer encoder, the attention weights are he softmax output of scaled dot product of nd) value, query ) key, value query, key18.In ViT, if the patch size is 16x16x3, the vectorization of each patch has the dimension_(A)256x3(B 768x1(c)256X1(768x319 Xavier Initialization is(A only used in fully connected neural networks.(B a scaling factor to the mean of the randomweights.C designed to work well with ReLU. used to make the variance of the activationsthe same across every layer.20.What is the purpose of dropout regularization in deep learning?(A) To reduce overfitting(B To increase the model's capacity(C To improve the training speed(D To handle imbalanced datasets21.Which optimizer is based on both momentum and adaptive learning?(A) RMSPropB) Adam(C SGD+ momentum(D AdaGrad22.Which optimizer has a problem of continually decaying of adaptive learning rates?(A) AdaGrad(B RMSProp(G) Adam(D) SGD + momentum23.Consider a GAN model which successfully produ images of apples. Which of the following statements is false?The generator aims to learn the distributic apple images.BThe discriminator can be used to classify images as apple vs. non-apple.(C After training the GAN, the discriminator eventually reaches a constant value.(D The generator can produce unseen imagapples.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Learning PostgreSQL

Authors: Salahaldin Juba, Achim Vannahme, Andrey Volkov

1st Edition

178398919X, 9781783989195

More Books

Students also viewed these Databases questions