Answered step by step
Verified Expert Solution
Question
1 Approved Answer
What is the learned Q table for the following code? Please run the code and show the output. import numpy as np import matplotlib.pyplot as
What is the learned Q
table for the following code? Please run the code and show the output.
import numpy as np
import matplotlib.pyplot as plt
# Grid world size
WORLDSIZE
# Percentage of cells occupied by obstacles
OBSTACLEDENSITY
# Learning parameters
ALPHA
GAMMA
EPSILON
def initializeworld:
# Create empty grid
world npzerosWORLDSIZE, WORLDSIZE
# Set start in bottom left
world
# Set goal in top right
world
# Add random obstacles
numobstacles intOBSTACLEDENSITY WORLDSIZE
obstacleindices nprandom.choicerangeWORLDSIZE sizenumobstacles, replaceFalse
for i in obstacleindices:
x i WORLDSIZE
y i WORLDSIZE
worldxy
return world
def initializeqvalues:
# Qsa initialized to for all sa
qvalues
for x in rangeWORLDSIZE:
for y in rangeWORLDSIZE:
for a in range: # up down, left, right
qvaluesxya
return qvalues
def epsilongreedystate qvalues, epsilon:
# With probability epsilon, take random action
# Otherwise, take greedy action based on current Q values
if nprandom.rand epsilon:
action nprandom.randint
else:
values qvaluesstate state a for a in range
action npargmaxvalues
return action
def updateqvaluestate action, reward, nextstate, qvalues, alpha, gamma:
# Qlearning update rule
maxqnext maxqvaluesnextstate nextstate a for a in range
qvaluesstate state action alpha reward gamma maxqnext qvaluesstate state action
return qvalues
def checkgoalstate:
return state WORLDSIZE WORLDSIZE
if namemain:
# Create world
world initializeworld
# Initialize Q values
qvalues initializeqvalues
# Track metrics
stepsperepisode
sse
for episode in range:
# Reset agent to start position
state
step
episodesse
while not checkgoalstate:
# Choose action using epsilongreedy
action epsilongreedystate qvalues, EPSILON
# Take action and get rewardnext state
if action : # up
nextstate state state
elif action : # down
nextstate state state
elif action : # left
nextstate state state
else: # right
nextstate state state
reward
if worldnextstate: # Hit obstacle
reward
nextstate state # Stay in current state
if checkgoalnextstate:
reward
# Update Q value
qvalues updateqvaluestate action, reward, nextstate, qvalues, ALPHA, GAMMA
# Calculate SSE
episodesse reward GAMMA maxqvaluesnextstate nextstate a for a in range qvaluesstate state action
# Update state
state nextstate
step
stepsperepisode.appendstep
sse.appendepisodesse
# Plot results
pltplotstepsperepisode
pltxlabelEpisode
pltylabelSteps per episode'
pltsavefigstepspng
pltplotsse
pltxlabelEpisode
pltylabelSum squared error'
pltsavefigssepng
# Print learned policy
policy
for x in rangeWORLDSIZE:
for y in rangeWORLDSIZE:
values qvaluesxya for a in range
policyxy npargmaxvalues
printLearned Optimal Policy:"
printpolicy
# Print the learned Qtable
printLearned Qtable:"
printqtable
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started