Question

1 Approved Answer

Posted on Sep 24, 2024

Question 4 [15 pt]: Consider the following gridworld. Double-rectangle states are exit states. From an exit state, the only action available is Exit, which results

image text in transcribed

Question 4 [15 pt]: Consider the following gridworld. Double-rectangle states are exit states. From an exit state, the only action available is Exit, which results in the listed reward and ends the game (by moving into a terminal state X, not shown). From non-exit states, the agent can choose either Left or Right actions, which move the agent in the corresponding direction. There are no living rewards; the only non-zero rewards come from exiting the grid Throughout this problem, assume that value iteration begins with initial values Vo(s)-0 for all states s. +1 +10 I. If = 0.5 and legal movement actions always succeed, (a) What is the optimal value V"(b)? (b) For what range of values of the discount will it be optimal to go Right from b? Remember that 0 1. 2. Now consider the Left and Right movement actions are stochastic and succeed with probability 0.5. When an action fails, the agent moves up or down with probability 0.25 each. When there is no square to move up or down into (as in the one-dimensional case), the agent stays in place. The Exit action does not fail. The discount is = i. (a) What is the optimal value V"(b)? (b) When running value iteration, what is the smallest value of k for which K(b) will be non-zero? (c) After how many iterations k will we have Volb) = V. (b)? If they will never become equal, write never. Question 4 [15 pt]: Consider the following gridworld. Double-rectangle states are exit states. From an exit state, the only action available is Exit, which results in the listed reward and ends the game (by moving into a terminal state X, not shown). From non-exit states, the agent can choose either Left or Right actions, which move the agent in the corresponding direction. There are no living rewards; the only non-zero rewards come from exiting the grid Throughout this problem, assume that value iteration begins with initial values Vo(s)-0 for all states s. +1 +10 I. If = 0.5 and legal movement actions always succeed, (a) What is the optimal value V"(b)? (b) For what range of values of the discount will it be optimal to go Right from b? Remember that 0 1. 2. Now consider the Left and Right movement actions are stochastic and succeed with probability 0.5. When an action fails, the agent moves up or down with probability 0.25 each. When there is no square to move up or down into (as in the one-dimensional case), the agent stays in place. The Exit action does not fail. The discount is = i. (a) What is the optimal value V"(b)? (b) When running value iteration, what is the smallest value of k for which K(b) will be non-zero? (c) After how many iterations k will we have Volb) = V. (b)? If they will never become equal, write never