Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Deep Reinforcement Learning Assignment 0 1 Problem Statement 1 3 Marks Title: Propose a suitable title Problem Statement: Define a problem statement of your own

Deep Reinforcement Learning
Assignment 01 Problem Statement
13 Marks
Title: Propose a suitable title
Problem Statement: Define a problem statement of your own with a well-defined
objective, gaming environment, and game controls. [1 Mark]
Concept Sketch: A pen and paper-based game concept sketching to illustrate the
proposed gaming problem statement. [1 Mark]
Additional Information: Provide any necessary information assumed/considered for
the game implementation.
Requirements and Deliverables:
Elaborate on how the described problem could be solved using deep neural
network and explain the action plan to create a gaming environment. [1 Mark]
Prepare a Colab sheet with outputs saved satisfying the following requirements.
Implementation should be in OpenAI gym with python. Develop a deep neural
network architecture and training procedure that effectively learns the optimal
policy for the spaceship to avoid collisions with asteroids and maximize its
survival time in the game environment.
i. Environment Setup: Define the game environment, including the state
space, action space, rewards, and terminal conditions. [1.5 Mark]
ii. Replay Buffer: Implement a replay buffer to store experiences (state,
action, reward, next state, terminal flag).[1.5 Mark]
iii. Deep Q-Network Architecture: Design the neural network architecture
for the DQN using Convolutional Neural Networks. The input to the
network is the game state, and the output is the Q-values for each
possible action. [2 Marks]
iv. Epsilon-Greedy Exploration: Implement an exploration strategy such
as epsilon-greedy to balance exploration (trying new actions) and
exploitation (using learned knowledge).[1 Mark]
v. Training Loop: Initialize the DQN and the target network (a separate
network used to stabilize training). In each episode, reset the
environment and observe the initial state. [2 Marks]
vi. Testing and Evaluation: After training, evaluate the DQN by running it
in the environment without exploration (set epsilon to 0). Monitor metrics
such as average reward per episode, survival time, etc., to assess the
performance. [2 Mark]

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

OCA Oracle Database SQL Exam Guide Exam 1Z0-071

Authors: Steve O'Hearn

1st Edition

1259585492, 978-1259585494

More Books

Students also viewed these Databases questions

Question

What can any retailer learn from this case?

Answered: 1 week ago

Question

I would have had to wait a long time for a reply.

Answered: 1 week ago

Question

Id already thrown away the receipt.

Answered: 1 week ago