Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

I know the Q(s,a) + R(s,a) + gamma*maxQ(s',a'). But is this how you would implement a calculation of the Q value and update state value

I know the Q(s,a) + R(s,a) + gamma*maxQ(s',a'). But is this how you would implement a calculation of the Q value and update state value in the Markov Decision Process (MDP) in python?

class Cell: def __init__(self,x,y): self.q_values=[0.0,0.0,0.0,0.0] self.location=(x,y) self.state_value=max(self.q_values) self.policy=0

def computeQValue(s,action): print('Compute Q Values') #s is states: a 3*4 list which denotes the MDP grid and each element in the list is an instance of Cell class which has the above data. #action from value 0-3 0-east, 1-south, 2-west, 3-north #For each cell based on action taken the q value is calculated #update the state data with the q value

for action in s[:]: #Do I need this state_old_value = s.state_value.copy() ? if action == ACTION_EAST: s.q_value[0] = ACTION_REWARD + GAMMA*(s.state_value+ACTION_EAST) s.state_value = ACTION_REWARD + GAMMA*(TRANSITION_SUCCEED*s.q_value[0] + TRANSITION_FAIL*s.q_value[1] + TRANSITION_FAIL*s.q_value[3]) elif action == ACTION_SOUTH: s.q_value[1] = ACTION_REWARD + GAMMA*(s.state_value+ACTION_SOUTH) s.state_value = ACTION_REWARD + GAMMA*(TRANSITION_SUCCEED*s.q_value[1] + TRANSITION_FAIL*s.q_value[0] + TRANSITION_FAIL*s.q_value[2]) elif action == ACTION_WEST: s.q_value[2] = ACTION_REWARD + GAMMA*(s.state_value+ACTION_WEST) s.state_value = ACTION_REWARD + GAMMA*(TRANSITION_SUCCEED*s.q_valu[2] + TRANSITION_FAIL*s.q_value[1] + TRANSITION_FAIL*s.q_value[3]) else: s.q_value[3] = ACTION_REWARD + GAMMA*(s.state_value+ACTION_NORTH) s.state_value = ACTION_REWARD + GAMMA*(TRANSITION_SUCCEED*s.q_value[3] + TRANSITION_FAIL*s.q_value[2] + TRANSITION_FAIL*s.q_value[0])

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Hands On Database

Authors: Steve Conger

1st Edition

013610827X, 978-0136108276

Students also viewed these Databases questions

Question

dy dx Find the derivative of the function y=(4x+3)5(2x+1)2.

Answered: 1 week ago

Question

How do modern Dashboards differ from earlier implementations?

Answered: 1 week ago