Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Aug 27, 2024

Deep Reinforcement Learning Assignment 0 1 Problem Statement 1 3 Marks Title: Propose a suitable title Problem Statement: Define a problem statement of your own

Deep Reinforcement Learning

Assignment $01$ Problem Statement

$13$ Marks

Title: Propose a suitable title

Problem Statement: Define a problem statement of your own with a well $-$ defined

objective, gaming environment, and game controls. $[1$ Mark $]$

Concept Sketch: A pen and paper $-$ based game concept sketching to illustrate the

proposed gaming problem statement. $[1$ Mark $]$

Additional Information: Provide any necessary information assumed $/$ considered for

the game implementation.

Requirements and Deliverables:

Elaborate on how the described problem could be solved using deep neural

network and explain the action plan to create a gaming environment. $[1$ Mark $]$

Prepare a Colab sheet with outputs saved satisfying the following requirements.

Implementation should be in OpenAI gym with python. Develop a deep neural

network architecture and training procedure that effectively learns the optimal

policy for the spaceship to avoid collisions with asteroids and maximize its

survival time in the game environment.

i $.$ Environment Setup: Define the game environment, including the state

space, action space, rewards, and terminal conditions. $[1.5$ Mark $]$

ii $.$ Replay Buffer: Implement a replay buffer to store experiences $($ state $,$

action, reward, next state, terminal flag $) . [1.5$ Mark $]$

iii. Deep Q $-$ Network Architecture: Design the neural network architecture

for the DQN using Convolutional Neural Networks. The input to the

network is the game state, and the output is the Q $-$ values for each

possible action. $[2$ Marks $]$

iv $.$ Epsilon $-$ Greedy Exploration: Implement an exploration strategy such

as epsilon $-$ greedy to balance exploration $($ trying new actions $)$ and

exploitation $($ using learned knowledge $) . [1$ Mark $]$

v $.$ Training Loop: Initialize the DQN and the target network $($ a separate

network used to stabilize training $) .$ In each episode, reset the

environment and observe the initial state. $[2$ Marks $]$

vi $.$ Testing and Evaluation: After training, evaluate the DQN by running it

in the environment without exploration $($ set epsilon to $0) .$ Monitor metrics

such as average reward per episode, survival time, etc., to assess the

performance. $[2$ Mark $]$

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

OCA Oracle Database SQL Exam Guide Exam 1Z0-071

Authors: Steve O'Hearn

1st Edition

1259585492, 978-1259585494

More Books

Students also viewed these Databases questions

Question

★★★★★

What can any retailer learn from this case?

Answered: 1 week ago

Question

★★★★★

=+ (e) The r of (i) must coincide with the R. f = R*f of (ii). A note on terminology. An f on the general (2, 7, u) is defined to be integrable not if (15.12) holds, but if (16.1) does. And an f on...

Answered: 1 week ago

Question

★★★★★

1. Based on concepts discussed in this chapter, describe the factors that have contributed to 3M's new product success. 2. Is 3M's product development process customer centered? Why or why not? 3....

Answered: 1 week ago

Question

★★★★★

Under the allowance method of accounting for uncollectible accounts, bad Debt Expense is debited when a specific account is written off as uncollectible. allowance for Doubtful Accounts is closed...

Answered: 1 week ago

Question

★★★★★

! Required information [ The following information applies to the questions displayed below. ] A pension fund manager is considering three mutual funds. The first is a stock fund, the second is a...

Answered: 1 week ago

Question

★★★★★

All doctors have advanced medical degrees. No employees in Clinic C have advanced medical degrees. Based on the information above, which of the following statements CANNOT be true?...

Answered: 1 week ago

Question

★★★★★

You have created a launcher that can launch golf balls with a velocity of 24 m/s [40 above horizontal]. Theshooter is located at a height of 2.0 m above the ground and a distance of 25 m from a wall...

Answered: 1 week ago

Question

★★★★★

The process of characterization is not always a technical, formalistic exercise, confined to the strict legal operation of the impugned law (law under review.) There are times when the Court will...

Answered: 1 week ago

Question

★★★★★

1. Consider a parent population with mean 75 and a standard deviation 7. The population doesn't appear to have extreme skewness or outliers. A. What are the mean and standard deviation of the...

Answered: 1 week ago

Question

★★★★★

Unit 2 FRQ Include correctly labeled diagrams, if useful or required, in explaining your answers. A correctly labeled diagram must have all axes and curves clearly labeled and must show directional...

Answered: 1 week ago

Question

★★★★★

I would have had to wait a long time for a reply.

Answered: 1 week ago

Question

★★★★★

Forget it. I sent an e-mail on their Web site before and heard nothing back.

Answered: 1 week ago

Question

★★★★★

Id already thrown away the receipt.

Answered: 1 week ago

Previous Question Next Question