Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 26, 2024

I know the Q(s,a) + R(s,a) + gamma*maxQ(s',a'). But is this how you would implement a calculation of the Q value and update state value

I know the Q(s,a) + R(s,a) + gamma*maxQ(s',a'). But is this how you would implement a calculation of the Q value and update state value in the Markov Decision Process (MDP) in python?

class Cell: def __init__(self,x,y): self.q_values=[0.0,0.0,0.0,0.0] self.location=(x,y) self.state_value=max(self.q_values) self.policy=0

def computeQValue(s,action): print('Compute Q Values') #s is states: a 3*4 list which denotes the MDP grid and each element in the list is an instance of Cell class which has the above data. #action from value 0-3 0-east, 1-south, 2-west, 3-north #For each cell based on action taken the q value is calculated #update the state data with the q value

for action in s[:]: #Do I need this state_old_value = s.state_value.copy() ? if action == ACTION_EAST: s.q_value[0] = ACTION_REWARD + GAMMA*(s.state_value+ACTION_EAST) s.state_value = ACTION_REWARD + GAMMA*(TRANSITION_SUCCEED*s.q_value[0] + TRANSITION_FAIL*s.q_value[1] + TRANSITION_FAIL*s.q_value[3]) elif action == ACTION_SOUTH: s.q_value[1] = ACTION_REWARD + GAMMA*(s.state_value+ACTION_SOUTH) s.state_value = ACTION_REWARD + GAMMA*(TRANSITION_SUCCEED*s.q_value[1] + TRANSITION_FAIL*s.q_value[0] + TRANSITION_FAIL*s.q_value[2]) elif action == ACTION_WEST: s.q_value[2] = ACTION_REWARD + GAMMA*(s.state_value+ACTION_WEST) s.state_value = ACTION_REWARD + GAMMA*(TRANSITION_SUCCEED*s.q_valu[2] + TRANSITION_FAIL*s.q_value[1] + TRANSITION_FAIL*s.q_value[3]) else: s.q_value[3] = ACTION_REWARD + GAMMA*(s.state_value+ACTION_NORTH) s.state_value = ACTION_REWARD + GAMMA*(TRANSITION_SUCCEED*s.q_value[3] + TRANSITION_FAIL*s.q_value[2] + TRANSITION_FAIL*s.q_value[0])

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Hands On Database

Hands On Database

Authors: Steve Conger

1st Edition

013610827X, 978-0136108276

Students also viewed these Databases questions

Question

05 Establish relationships/alliances with key individuals and organizations in the community to assist in achieving the organizations strategic goals and objectives.

Answered: 1 week ago

Question

★★★★★

Martin Realtors, a real estate consulting firm, specializes in advising companies on potential new plant sites. The company uses a job order costing system with a predetermined indirect cost...

Answered: 1 week ago

Question

★★★★★

I know the Q(s,a) + R(s,a) + gamma*maxQ(s',a'). But is this how you would implement a calculation of the Q value and update state value in the Markov Decision Process (MDP) in python? class Cell: def...

Answered: 1 week ago

Question

★★★★★

Ameristar Hotels has an issue of $1,000 par bonds with 21 years remaining till maturity. The bond has a coupon rate of 7.75%. The current market price of the bond is $1,350. What is the current yield...

Answered: 1 week ago

Question

★★★★★

Question 9 (1 point) Consider the following code: Money m = new Money (25,73); boolean ok = m.remove(26, 19); System.out.println( ok ); System.out.println( m.toString()); What is displayed to the...

Answered: 1 week ago

Question

★★★★★

Current Attempt in Progress Your answer is partially correct. Sheridan Corporation reported the following information for the year ended December 31: Balance sheet accounts: 2024 2023 Income...

Answered: 1 week ago

Question

★★★★★

A ball rolls up a 2.45 m long incline. It reaches the top in 1.73 seconds where it has a final speed of 1.22 m/s. What is the acceleration of the ball (m/s)? 0.227 -0.498 0.705 O 0.957 0.610 O 1.42...

Answered: 1 week ago

Question

★★★★★

Jamal and Ursula Hairston have earned income of $15,000 and $16,000 respectively, adjusted gross income of $29,000, daycare expenses of $1,000, and one dependent child, age 5. What is their child and...

Answered: 1 week ago

Question

★★★★★

dy dx Find the derivative of the function y=(4x+3)5(2x+1)2.

Answered: 1 week ago

Question

★★★★★

How do modern Dashboards differ from earlier implementations?

Answered: 1 week ago

Question

★★★★★

Provide an example of a descending Hierarchy of Data Validation/Lookup Tables.

Answered: 1 week ago

Question

★★★★★

In a HCM Database, how does applying Relational Design and Third Normal Form rules avoid duplication of Job Title storage in each employee base record?

Answered: 1 week ago

Previous Question Next Question