Answered step by step
Verified Expert Solution
Question
1 Approved Answer
1 2 3 4 5 6 7 8 9 1 0 1 1 1 2 1 3 1 4 1 5 1 6 1 7
import numpy as np
import pandas as pd
import gymnasium as gym
def loadofflinedatapath minscore:
statedata
actiondata
rewarddata
nextstatedata
terminateddata
dataset pdreadcsvpath
datasetgroup dataset.groupbyPlay #
for playno df in datasetgroup:
state nparraydfiloc:
state nparraynpfromstringrow: dtypenpfloat sep for row in state
action nparraydfiloc:astypeint
reward nparraydfiloc:astypenpfloat
nextstate nparraydfiloc:
nextstate nparraynpfromstringrow: dtypenpfloat sep for row in nextstate
terminated nparraydfiloc:astypeint
totalreward npsumreward
if totalrewardminscore:
statedata.appendstate
actiondata.appendaction
rewarddata.appendreward
nextstatedata.appendnextstate
terminateddata.appendterminated
statedata npconcatenatestatedata
actiondata npconcatenateactiondata
rewarddata npconcatenaterewarddata
nextstatedata npconcatenatenextstatedata
terminateddata npconcatenateterminateddata
rturn statedata, actiondata, rewarddata, nextstatedata, terminateddata
def plotrewardtotalrewardperepisode, windowlength:
# This function should display:
# i total reward per episode.
# ii moving average of the total reward. The window for moving average
# should slide by one episode every time.
pass
def DQNtrainingenv offlinedata, useofflinedata:
# The function should return the final trained DQN model and total reward
# of every episode.
pass
# Initiate the lunar lander environment.
# NO RENDERING. It will slow the training process.
env gym.makeLunarLanderv
# Load the offline data collected in step Also, process the dataset.
path 'lunardataset.csv # This should contain the path to the collected dataset.
minscore npInf # The minimum total reward of an episode that should be used for training.
offlinedata loadofflinedatapath minscore
# Train DQN model of Architecture type
useofflinedata True # If True then the offline data will be used. Else, offline data will not be used.
finalmodel, totalrewardperepisode DQNtrainingenv offlinedata, useofflinedata
# Save the final model
finalmodel.savelunarlandermodel.h # This line is for Keras. Replace this appropriate code.
# Plot reward per episode and moving average reward
windowlength # Window length for moving average reward.
plotrewardtotalrewardperepisode, windowlength
env.close
this is the skeleton of training.py
give me the code for dqn architecute without using offline data. Which means take action or data from env it self
Make sure when u use the traninn.py code above give dqn architecuture which takes state and action as input and outputs only one q value with respect to that action. Also make sure you get a increasing reward trend when u plot it
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started