Answered step by step
Verified Expert Solution
Question
1 Approved Answer
Question 4 [15 pt]: Consider the following gridworld. Double-rectangle states are exit states. From an exit state, the only action available is Exit, which results
Question 4 [15 pt]: Consider the following gridworld. Double-rectangle states are exit states. From an exit state, the only action available is Exit, which results in the listed reward and ends the game (by moving into a terminal state X, not shown). From non-exit states, the agent can choose either Left or Right actions, which move the agent in the corresponding direction. There are no living rewards; the only non-zero rewards come from exiting the grid Throughout this problem, assume that value iteration begins with initial values Vo(s)-0 for all states s. +1 +10 I. If = 0.5 and legal movement actions always succeed, (a) What is the optimal value V"(b)? (b) For what range of values of the discount will it be optimal to go Right from b? Remember that 0 1. 2. Now consider the Left and Right movement actions are stochastic and succeed with probability 0.5. When an action fails, the agent moves up or down with probability 0.25 each. When there is no square to move up or down into (as in the one-dimensional case), the agent stays in place. The Exit action does not fail. The discount is = i. (a) What is the optimal value V"(b)? (b) When running value iteration, what is the smallest value of k for which K(b) will be non-zero? (c) After how many iterations k will we have Volb) = V. (b)? If they will never become equal, write never. Question 4 [15 pt]: Consider the following gridworld. Double-rectangle states are exit states. From an exit state, the only action available is Exit, which results in the listed reward and ends the game (by moving into a terminal state X, not shown). From non-exit states, the agent can choose either Left or Right actions, which move the agent in the corresponding direction. There are no living rewards; the only non-zero rewards come from exiting the grid Throughout this problem, assume that value iteration begins with initial values Vo(s)-0 for all states s. +1 +10 I. If = 0.5 and legal movement actions always succeed, (a) What is the optimal value V"(b)? (b) For what range of values of the discount will it be optimal to go Right from b? Remember that 0 1. 2. Now consider the Left and Right movement actions are stochastic and succeed with probability 0.5. When an action fails, the agent moves up or down with probability 0.25 each. When there is no square to move up or down into (as in the one-dimensional case), the agent stays in place. The Exit action does not fail. The discount is = i. (a) What is the optimal value V"(b)? (b) When running value iteration, what is the smallest value of k for which K(b) will be non-zero? (c) After how many iterations k will we have Volb) = V. (b)? If they will never become equal, write never
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started