Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

How can I structure two DQN models that take state and action as inputs and output Q - values for those state - action pairs?

How can I structure two DQN models that take state and action as inputs and output Q-values for those state-action pairs? I have a skeleton of the code here : import numpy as np
import pandas as pd
import gymnasium as gym
def load_offline_data(path, min_score):
state_data =[]
action_data =[]
reward_data =[]
next_state_data =[]
terminated_data =[]
dataset = pd.read_csv(path)
dataset_group = dataset.groupby('Play #')
for play_no, df in dataset_group:
state = np.array(df.iloc[:,1])
state = np.array([np.fromstring(row[1:-1], dtype=np.float32, sep='') for row in state])
action = np.array(df.iloc[:,2]).astype(int)
reward = np.array(df.iloc[:,3]).astype(np.float32)
next_state = np.array(df.iloc[:,4])
next_state = np.array([np.fromstring(row[1:-1], dtype=np.float32, sep='') for row in next_state])
terminated = np.array(df.iloc[:,5]).astype(int)
total_reward = np.sum(reward)
if total_reward>=min_score:
state_data.append(state)
action_data.append(action)
reward_data.append(reward)
next_state_data.append(next_state)
terminated_data.append(terminated)
state_data = np.concatenate(state_data)
action_data = np.concatenate(action_data)
reward_data = np.concatenate(reward_data)
next_state_data = np.concatenate(next_state_data)
terminated_data = np.concatenate(terminated_data)
return state_data, action_data, reward_data, next_state_data, terminated_data
def plot_reward(total_reward_per_episode, window_length):
# This function should display:
# (i) total reward per episode.
# (ii) moving average of the total reward. The window for moving average
# should slide by one episode every time.
pass
def DQN_training(env, offline_data, use_offline_data):
# The function should return the final trained DQN model and total reward
# of every episode.
pass
# Initiate the lunar lander environment.
# NO RENDERING. It will slow the training process.
env = gym.make('LunarLander-v2')
# Load the offline data collected in step 3. Also, process the dataset.
path = 'lunar_dataset.csv' # This should contain the path to the collected dataset.
min_score =-np.Inf # The minimum total reward of an episode that should be used for training.
offline_data = load_offline_data(path, min_score)
# Train DQN model of Architecture type 1
use_offline_data = True # If True then the offline data will be used. Else, offline data will not be used.
final_model, total_reward_per_episode = DQN_training(env, offline_data, use_offline_data)
# Save the final model
final_model.save('lunar_lander_model.h5') # This line is for Keras. Replace this appropriate code.
# Plot reward per episode and moving average reward
window_length =50 # Window length for moving average reward.
plot_reward(total_reward_per_episode, window_length)
env.close()

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Development For Dummies

Authors: Allen G. Taylor

1st Edition

978-0764507526

More Books

Students also viewed these Databases questions