Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Aug 02, 2024

Below i have attached the training.py code . Give the DQN architecture 1 code which takes state and action as input and returns only one

Below i have attached the training.py code

.

Give the DQN architecture

1

code which takes state and action as input and returns only one q value. Integrate the code with the training.py and give

.

Make sure you are getting a increasing reward trend as episodes increase and dont use the offline data as mentioned in the image. Other guidelines are there in image

from keras.models import Model

from keras.layers import Input, Dense, Lambda

from keras.optimizers import Adam

import keras.backend as K

from collections import deque

import random

# Constants

BATCH

_

SIZE

= 64

GAMMA

= 0.99

EPSILON

_

START

= 1.0

EPSILON

_

MIN

= 0.01

EPSILON

_

DECAY

= 0.995

LEARNING

_

RATE

= 0.001

# Dueling DQN Model Architecture

def create

_

dueling

_

dqn

_

model

(

input

_

shape, action

_

space

)

state

_

input

=

Input

(

shape

= (

input

_

shape,

))

=

Dense

(512,

activation

=

'relu'

) (

state

_

input

)

=

Dense

(256,

activation

=

'relu'

) (

)

=

Dense

(64,

activation

=

'relu'

) (

)

state

_

value

=

Dense

(1) (

)

state

_

value

=

Lambda

(

lambda s: K

.

expand

_

dims

(

[

, 0], - 1),

output

_

shape

= (

action

_

space,

)) (

state

_

value

)

action

_

advantage

=

Dense

(

action

_

space

) (

)

action

_

advantage

=

Lambda

(

lambda a: a

[

,

] -

.

mean

(

[

,

],

keepdims

=

True

),

output

_

shape

= (

action

_

space,

)) (

action

_

advantage

)

_

values

=

Lambda

(

lambda w: w

[0] +

[1],

output

_

shape

= (

action

_

space,

)) ([

state

_

value, action

_

advantage

])

model

=

Model

(

inputs

=

state

_

input, outputs

=

_

values

)

model.compile

(

loss

=

'mse', optimizer

=

Adam

(

=

LEARNING

_

RATE

))

return model

# DQN Training Function

def DQN

_

training

(

env

,

offline

_

data, use

_

offline

_

data

=

False

)

state

_

size

=

env.observation

_

space.shape

[0]

action

_

size

=

env.action

_

space.n

model

=

create

_

dueling

_

dqn

_

model

(

state

_

size, action

_

size

)

replay

_

buffer

=

deque

(

maxlen

= 2000)

epsilon

=

EPSILON

_

START

total

_

reward

_

per

_

episode

= []

for episode in range

(1000)

: # Number of episodes

state

=

env.reset

()

state

=

.

reshape

(

state

, [1,

state

_

size

])

total

_

reward

= 0

for time

_

step in range

(500)

: # Max steps in an episode

if np

.

random.rand

() =

epsilon:

action

=

env.action

_

space.sample

()

# Explore action space

else:

_

values

=

model.predict

(

state

)

action

=

.

argmax

(

_

values

[0])

# Exploit learned values

_

state, reward, done,

_=

env.step

(

action

)

_

state

=

.

reshape

(

_

state,

[1,

state

_

size

])

total

_

reward

+ =

reward

if not use

_

offline

_

data: # Only save and learn if not using offline data

replay

_

buffer.append

((

state

,

action, reward, next

_

state, done

))

if len

(

replay

_

buffer

) >

BATCH

_

SIZE:

minibatch

=

random.sample

(

replay

_

buffer, BATCH

_

SIZE

)

for s

,

,

,

_

,

d in minibatch:

target

=

if not d:

target

=

+

GAMMA

*

.

amax

(

model

.

predict

(

_

) [0])

target

_

=

model.predict

(

)

target

_

[0] [

] =

target

model.fit

(

,

target

_

,

epochs

= 1,

verbose

= 0)

state

=

_

state

if done:

break

total

_

reward

_

per

_

episode.append

(

total

_

reward

)

# Update epsilon

epsilon

=

max

(

EPSILON

_

MIN, epsilon

*

EPSILON

_

DECAY

)

return model, np

.

array

(

total

_

reward

_

per

_

episode

)

# Replace this line with any initialization of the environment required before training

# env

=

gym.make

('

LunarLander

-

2')

# Do not load offline data

use

_

offline

_

data

=

False

# Now you would call DQN

_

training like this:

# final

_

model, total

_

reward

_

per

_

episode

=

DQN

_

training

(

env

,

None, use

_

offline

_

data

)

# After training, you'd save your model and plot the rewards.

Section

3

: Train DQN Model

In this section you will train two DQN models of Architecture type

1,

.

.

the DQN model should accept

the state and the action as input and the output of the model should be the Q

-

value of the state

-

action

pair given in the input. The first DQN model should be without the data collected in step

3

and the

second one uses the data.

VERY IMPORTANT: If you are coding DQN model of Architecture type

2 (

.

.

the DQN

model that accepts state as input and the output is Q

-

value of all the state

-

action pair

),

you will get a ZERO for this section. There will be NO MERCY in this regard.

Deliverables

(75

marks

)

: You are given a Python script

training.py

.

This script contains the bare basic

skeleton of the DQN training code along with a function that loads the data collected in step

3 .

You must

NOT change the overall structure of the skeleton. There are two functions in

training.py: DQN

_

training

and plot

_

reward. Your task is to write the code for these two functions. Few additional instructions:

This function MUST train DQN of architecture

1 (

the DQN model should accept the state and the

action as input and the output of the model should be the Q

-

value of the state

-

action pair give

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Beginning Microsoft SQL Server 2012 Programming

Authors: Paul Atkinson, Robert Vieira

1st Edition

★★★★★

3. Would you like to work at a Stew Leonards retail store? Develop your rationale for employment or nonemployment.

Answered: 1 week ago

Previous Question Next Question