Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

this is the training.py code below give the dqn architecture 1 . which take state action pair as input and gives one q value with

this is the training.py code below give the dqn architecture 1. which take state action pair as input and gives one q value with respect to that action also make sure you get a increasing reward trend as episode increases . if rewards are not increasing as episodes are increasing then there in no point.
Code-----
import numpy as np
import pandas as pd
import gymnasium as gym
def load_offline_data(path, min_score):
state_data =[]
action_data =[]
reward_data =[]
next_state_data =[]
terminated_data =[]
dataset = pd.read_csv(path)
dataset_group = dataset.groupby('Play #')
for play_no, df in dataset_group:
state = np.array(df.iloc[:,1])
state = np.array([np.fromstring(row[1:-1], dtype=np.float32, sep='') for row in state])
action = np.array(df.iloc[:,2]).astype(int)
reward = np.array(df.iloc[:,3]).astype(np.float32)
next_state = np.array(df.iloc[:,4])
next_state = np.array([np.fromstring(row[1:-1], dtype=np.float32, sep='') for row in next_state])
terminated = np.array(df.iloc[:,5]).astype(int)
total_reward = np.sum(reward)a
if total_reward>=min_score:
state_data.append(state)
action_data.append(action)
reward_data.append(reward)
next_state_data.append(next_state)
terminated_data.append(terminated)
state_data = np.concatenate(state_data)
action_data = np.concatenate(action_data)
reward_data = np.concatenate(reward_data)a
next_state_data = np.concatenate(next_state_data)
terminated_data = np.concatenate(terminated_data)
return state_data, action_data, reward_data, next_state_data, terminated_data
def plot_reward(total_reward_per_episode, window_length):
# This function should display:
# (i) total reward per episode.
# (ii) moving average of the total reward. The window for moving average
# should slide by one episode every time.
pass
def DQN_training(env, offline_data, use_offline_data):
# The function should return the final trained DQN model and total reward
# of every episode.
pass
# Initiate the lunar lander environment.
# NO RENDERING. It will slow the training process.
env = gym.make('LunarLander-v2')
# Load the offline data collected in step 3. Also, process the dataset.
path = 'lunar_dataset.csv' # This should contain the path to the collected dataset.
min_score =-np.Inf # The minimum total reward of an episode that should be used for training.
offline_data = load_offline_data(path, min_score)
# Train DQN model of Architecture type 1
use_offline_data = True # If True then the offline data will be used. Else, offline data will not be used.
final_model, total_reward_per_episode = DQN_training(env, offline_data, use_offline_data)
# Save the final model
final_model.save('lunar_lander_model.h5') # This line is for Keras. Replace this appropriate code.
# Plot reward per episode and moving average reward
window_length =50 # Window length for moving average reward.
plot_reward(total_reward_per_episode, window_length)
env.close()This
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Beginning Microsoft SQL Server 2012 Programming

Authors: Paul Atkinson, Robert Vieira

1st Edition

1118102282, 9781118102282

More Books

Students also viewed these Databases questions

Question

Explain the factors that determine the degree of decentralisation

Answered: 1 week ago

Question

What Is acidity?

Answered: 1 week ago

Question

Explain the principles of delegation

Answered: 1 week ago

Question

State the importance of motivation

Answered: 1 week ago

Question

Discuss the various steps involved in the process of planning

Answered: 1 week ago

Question

Define broadbanding. What is the purpose of using broadbanding?

Answered: 1 week ago

Question

Distinguish between merit pay, bonus, spot bonuses, and piecework.

Answered: 1 week ago