Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Question 4 [15 pt]: Consider the following gridworld. Double-rectangle states are exit states. From an exit state, the only action available is Exit, which results

image text in transcribed

Question 4 [15 pt]: Consider the following gridworld. Double-rectangle states are exit states. From an exit state, the only action available is Exit, which results in the listed reward and ends the game (by moving into a terminal state X, not shown). From non-exit states, the agent can choose either Left or Right actions, which move the agent in the corresponding direction. There are no living rewards; the only non-zero rewards come from exiting the grid Throughout this problem, assume that value iteration begins with initial values Vo(s)-0 for all states s. +1 +10 I. If = 0.5 and legal movement actions always succeed, (a) What is the optimal value V"(b)? (b) For what range of values of the discount will it be optimal to go Right from b? Remember that 0 1. 2. Now consider the Left and Right movement actions are stochastic and succeed with probability 0.5. When an action fails, the agent moves up or down with probability 0.25 each. When there is no square to move up or down into (as in the one-dimensional case), the agent stays in place. The Exit action does not fail. The discount is = i. (a) What is the optimal value V"(b)? (b) When running value iteration, what is the smallest value of k for which K(b) will be non-zero? (c) After how many iterations k will we have Volb) = V. (b)? If they will never become equal, write never. Question 4 [15 pt]: Consider the following gridworld. Double-rectangle states are exit states. From an exit state, the only action available is Exit, which results in the listed reward and ends the game (by moving into a terminal state X, not shown). From non-exit states, the agent can choose either Left or Right actions, which move the agent in the corresponding direction. There are no living rewards; the only non-zero rewards come from exiting the grid Throughout this problem, assume that value iteration begins with initial values Vo(s)-0 for all states s. +1 +10 I. If = 0.5 and legal movement actions always succeed, (a) What is the optimal value V"(b)? (b) For what range of values of the discount will it be optimal to go Right from b? Remember that 0 1. 2. Now consider the Left and Right movement actions are stochastic and succeed with probability 0.5. When an action fails, the agent moves up or down with probability 0.25 each. When there is no square to move up or down into (as in the one-dimensional case), the agent stays in place. The Exit action does not fail. The discount is = i. (a) What is the optimal value V"(b)? (b) When running value iteration, what is the smallest value of k for which K(b) will be non-zero? (c) After how many iterations k will we have Volb) = V. (b)? If they will never become equal, write never

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advanced Oracle Solaris 11 System Administration

Authors: Bill Calkins

1st Edition

0133007170, 9780133007176

More Books

Students also viewed these Databases questions

Question

3 How supply and demand together determine market equilibrium.

Answered: 1 week ago

Question

1 What demand is and what affects it.

Answered: 1 week ago