Answered step by step
Verified Expert Solution
Question
1 Approved Answer
How can I structure two DQN models that take state and action as inputs and output Q - values for those state - action pairs?
How can I structure two DQN models that take state and action as inputs and output Qvalues for those stateaction pairs? I have a skeleton of the code here : import numpy as np
import pandas as pd
import gymnasium as gym
def loadofflinedatapath minscore:
statedata
actiondata
rewarddata
nextstatedata
terminateddata
dataset pdreadcsvpath
datasetgroup dataset.groupbyPlay #
for playno df in datasetgroup:
state nparraydfiloc:
state nparraynpfromstringrow: dtypenpfloat sep for row in state
action nparraydfiloc:astypeint
reward nparraydfiloc:astypenpfloat
nextstate nparraydfiloc:
nextstate nparraynpfromstringrow: dtypenpfloat sep for row in nextstate
terminated nparraydfiloc:astypeint
totalreward npsumreward
if totalrewardminscore:
statedata.appendstate
actiondata.appendaction
rewarddata.appendreward
nextstatedata.appendnextstate
terminateddata.appendterminated
statedata npconcatenatestatedata
actiondata npconcatenateactiondata
rewarddata npconcatenaterewarddata
nextstatedata npconcatenatenextstatedata
terminateddata npconcatenateterminateddata
return statedata, actiondata, rewarddata, nextstatedata, terminateddata
def plotrewardtotalrewardperepisode, windowlength:
# This function should display:
# i total reward per episode.
# ii moving average of the total reward. The window for moving average
# should slide by one episode every time.
pass
def DQNtrainingenv offlinedata, useofflinedata:
# The function should return the final trained DQN model and total reward
# of every episode.
pass
# Initiate the lunar lander environment.
# NO RENDERING. It will slow the training process.
env gym.makeLunarLanderv
# Load the offline data collected in step Also, process the dataset.
path 'lunardataset.csv # This should contain the path to the collected dataset.
minscore npInf # The minimum total reward of an episode that should be used for training.
offlinedata loadofflinedatapath minscore
# Train DQN model of Architecture type
useofflinedata True # If True then the offline data will be used. Else, offline data will not be used.
finalmodel, totalrewardperepisode DQNtrainingenv offlinedata, useofflinedata
# Save the final model
finalmodel.savelunarlandermodel.h # This line is for Keras. Replace this appropriate code.
# Plot reward per episode and moving average reward
windowlength # Window length for moving average reward.
plotrewardtotalrewardperepisode, windowlength
env.close
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started