Answered step by step
Verified Expert Solution
Link Copied!

Question

00
1 Approved Answer

1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
import numpy as np
import pandas as pd
import gymnasium as gym
def load_offline_data(path, min_score):
state_data =[]
action_data =[]
reward_data =[]
next_state_data =[]
terminated_data =[]
dataset = pd.read_csv(path)
dataset_group = dataset.groupby('Play #')
for play_no, df in dataset_group:
state = np.array(df.iloc[:,1])
state = np.array([np.fromstring(row[1:-1], dtype=np.float32, sep='') for row in state])
action = np.array(df.iloc[:,2]).astype(int)
reward = np.array(df.iloc[:,3]).astype(np.float32)
next_state = np.array(df.iloc[:,4])
next_state = np.array([np.fromstring(row[1:-1], dtype=np.float32, sep='') for row in next_state])
terminated = np.array(df.iloc[:,5]).astype(int)
total_reward = np.sum(reward)
if total_reward>=min_score:
state_data.append(state)
action_data.append(action)
reward_data.append(reward)
next_state_data.append(next_state)
terminated_data.append(terminated)
state_data = np.concatenate(state_data)
action_data = np.concatenate(action_data)
reward_data = np.concatenate(reward_data)
next_state_data = np.concatenate(next_state_data)
terminated_data = np.concatenate(terminated_data)
rturn state_data, action_data, reward_data, next_state_data, terminated_data
def plot_reward(total_reward_per_episode, window_length):
# This function should display:
# (i) total reward per episode.
# (ii) moving average of the total reward. The window for moving average
# should slide by one episode every time.
pass
def DQN_training(env, offline_data, use_offline_data):
# The function should return the final trained DQN model and total reward
# of every episode.
pass
# Initiate the lunar lander environment.
# NO RENDERING. It will slow the training process.
env = gym.make('LunarLander-v2')
# Load the offline data collected in step 3. Also, process the dataset.
path = 'lunar_dataset.csv' # This should contain the path to the collected dataset.
min_score =-np.Inf # The minimum total reward of an episode that should be used for training.
offline_data = load_offline_data(path, min_score)
# Train DQN model of Architecture type 1
use_offline_data = True # If True then the offline data will be used. Else, offline data will not be used.
final_model, total_reward_per_episode = DQN_training(env, offline_data, use_offline_data)
# Save the final model
final_model.save('lunar_lander_model.h5') # This line is for Keras. Replace this appropriate code.
# Plot reward per episode and moving average reward
window_length =50 # Window length for moving average reward.
plot_reward(total_reward_per_episode, window_length)
env.close()
this is the skeleton of training.py
give me the code for dqn architecute 1 without using offline data. Which means take action or data from env it self
Make sure when u use the traninn.py code above give dqn architecuture which takes state and action as input and outputs only one q value with respect to that action. Also make sure you get a increasing reward trend when u plot it.
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions