Question: I have this question for a Deep Learning course past exam. I want to check if my answers are correct. I would like the answer
I have this question for a Deep Learning course past exam. I want to check if my answers are correct. I would like the answer for the question.

Question 2. [20 MARKS] Suppose we want to train an auto-regressive generative model which can generate a short gray-scale video for a given sentence. We assume that we have access to a dataset of (sentence, video) pairs where each sentence consists of a sequence of words (w1,,wn) and each video consists of a sequence of images (frames) (w1,,wn). We want to design the model such that in addition to information about generated frames, it also attends to all the words in the sentence to decide the next frame. Part (a) [10 MARKs] Explain how you would design the model and what loss function you would use. Part (b) [10 MARKs] Suppose with the same assumptions as before, we now want to inverse the model such that the inputs are videos and outputs are sentences describing the video. Explain how you would design the model and what loss function you would use
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
