Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Deep Reinforcement Learning - - - Please answer all question Assignment 0 1 Problem Statement 1 3 Marks Title: Propose a suitable title Problem Statement:

Deep Reinforcement Learning --- Please answer all question
Assignment 01 Problem Statement
13 Marks
Title: Propose a suitable title
Problem Statement: Define a problem statement of your own with a well-defined objective, gaming environment, and game controls. [1 Mark]
Concept Sketch: A pen and paper-based game concept sketching to illustrate the proposed gaming problem statement. [1 Mark]
Additional Information: Provide any necessary information assumed/considered for the game implementation.
Requirements and Deliverables:
Elaborate on how the described problem could be solved using deep neural network and explain the action plan to create a gaming environment. [1 Mark]
Prepare a Colab sheet with outputs saved satisfying the following requirements. Implementation should be in OpenAI gym with python. Develop a deep neural network architecture and training procedure that effectively learns the optimal policy for the spaceship to avoid collisions with asteroids and maximize its survival time in the game environment.
i.
Environment Setup: Define the game environment, including the state space, action space, rewards, and terminal conditions. [1.5 Mark]
ii.
Replay Buffer: Implement a replay buffer to store experiences (state, action, reward, next state, terminal flag).[1.5 Mark]
iii.
Deep Q-Network Architecture: Design the neural network architecture for the DQN using Convolutional Neural Networks. The input to the network is the game state, and the output is the Q-values for each possible action. [2 Marks]
iv.
Epsilon-Greedy Exploration: Implement an exploration strategy such as epsilon-greedy to balance exploration (trying new actions) and exploitation (using learned knowledge).[1 Mark]
v.
Training Loop: Initialize the DQN and the target network (a separate network used to stabilize training). In each episode, reset the environment and observe the initial state. [2 Marks]
vi.
Testing and Evaluation: After training, evaluate the DQN by running it in the environment without exploration (set epsilon to 0). Monitor metrics such as average reward per episode, survival time, etc., to assess the performance. [2 Mark]
Please provide the complete code based solution. I need to run the code. So, please provide the complete code which we can run.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions