Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Below i have attached the training.py code . Give the DQN architecture 1 code which takes state and action as input and returns only one

Below i have attached the training.py code
.
Give the DQN architecture
1
code which takes state and action as input and returns only one q value. Integrate the code with the training.py and give
.
Make sure you are getting a increasing reward trend as episodes increase and dont use the offline data as mentioned in the image. Other guidelines are there in image
from keras.models import Model
from keras.layers import Input, Dense, Lambda
from keras.optimizers import Adam
import keras.backend as K
from collections import deque
import random
# Constants
BATCH_SIZE =64
GAMMA =0.99
EPSILON_START =1.0
EPSILON_MIN =0.01
EPSILON_DECAY =0.995
LEARNING_RATE =0.001
# Dueling DQN Model Architecture
def create_dueling_dqn_model(input_shape, action_space):
state_input = Input(shape=(input_shape,))
x = Dense(512, activation='relu')(state_input)
x = Dense(256, activation='relu')(x)
x = Dense(64, activation='relu')(x)
state_value = Dense(1)(x)
state_value = Lambda(lambda s: K.expand_dims(s[:,0],-1), output_shape=(action_space,))(state_value)
action_advantage = Dense(action_space)(x)
action_advantage = Lambda(lambda a: a[:, :]- K.mean(a[:, :], keepdims=True), output_shape=(action_space,))(action_advantage)
q_values = Lambda(lambda w: w[0]+ w[1], output_shape=(action_space,))([state_value, action_advantage])
model = Model(inputs=state_input, outputs=q_values)
model.compile(loss='mse', optimizer=Adam(lr=LEARNING_RATE))
return model
# DQN Training Function
def DQN_training(env, offline_data, use_offline_data=False):
state_size = env.observation_space.shape[0]
action_size = env.action_space.n
model = create_dueling_dqn_model(state_size, action_size)
replay_buffer = deque(maxlen=2000)
epsilon = EPSILON_START
total_reward_per_episode =[]
for episode in range(1000): # Number of episodes
state = env.reset()
state = np.reshape(state,[1, state_size])
total_reward =0
for time_step in range(500): # Max steps in an episode
if np.random.rand()= epsilon:
action = env.action_space.sample() # Explore action space
else:
q_values = model.predict(state)
action = np.argmax(q_values[0]) # Exploit learned values
next_state, reward, done, _= env.step(action)
next_state = np.reshape(next_state, [1, state_size])
total_reward += reward
if not use_offline_data: # Only save and learn if not using offline data
replay_buffer.append((state, action, reward, next_state, done))
if len(replay_buffer)> BATCH_SIZE:
minibatch = random.sample(replay_buffer, BATCH_SIZE)
for s, a, r, n_s, d in minibatch:
target = r
if not d:
target = r + GAMMA * np.amax(model.predict(n_s)[0])
target_f = model.predict(s)
target_f[0][a]= target
model.fit(s, target_f, epochs=1, verbose=0)
state = next_state
if done:
break
total_reward_per_episode.append(total_reward)
# Update epsilon
epsilon = max(EPSILON_MIN, epsilon * EPSILON_DECAY)
return model, np.array(total_reward_per_episode)
# Replace this line with any initialization of the environment required before training
# env = gym.make('LunarLander-v2')
# Do not load offline data
use_offline_data = False
# Now you would call DQN_training like this:
# final_model, total_reward_per_episode = DQN_training(env, None, use_offline_data)
# After training, you'd save your model and plot the rewards.
Section 3: Train DQN Model
In this section you will train two DQN models of Architecture type 1, i.e. the DQN model should accept
the state and the action as input and the output of the model should be the Q-value of the state-action
pair given in the input. The first DQN model should be without the data collected in step 3 and the
second one uses the data.
VERY IMPORTANT: If you are coding DQN model of Architecture type 2(i.e. the DQN
model that accepts state as input and the output is Q-value of all the state-action pair),
you will get a ZERO for this section. There will be NO MERCY in this regard.
Deliverables (75 marks): You are given a Python script
training.py. This script contains the bare basic
skeleton of the DQN training code along with a function that loads the data collected in step 3. You must
NOT change the overall structure of the skeleton. There are two functions in
training.py: DQN_training
and plot_reward. Your task is to write the code for these two functions. Few additional instructions:
This function MUST train DQN of architecture 1(the DQN model should accept the state and the
action as input and the output of the model should be the Q-value of the state-action pair give
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Beginning Microsoft SQL Server 2012 Programming

Authors: Paul Atkinson, Robert Vieira

1st Edition

1118102282, 9781118102282

More Books

Students also viewed these Databases questions