Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

For an MDP (S, A, T, R,y), let Vo: SR be an initial guess of the optimal value function V*. Suppose that this guess

For an MDP (S, A, T, R,y), let Vo: SR be an initial guess of the optimal value function V*. Suppose that this guess is progressively updated using Value Iteration: that is, by setting Vt+1+B* (Vt) for t = 0, 1, 2,.... Recall that B* is the Bellman optimality operator. In this question, we examine the design of a stopping condition for Value Iteration. As usual, let ||-|| denote the max norm. We would like that our computed solution, V for some u {1,2,...}, be within e of V* for some given tolerance > 0. In other words, we would like to stop after u applications of B*, so long as we can guarantee ||Vu-V*|| e. Naturally, we cannot use V* itself in our stopping rule, since it is not known! Show that it suffices to stop when (1-7) Vu-Vu-1||0 Y and thereafter return V as the answer. You are likely to find two results handy: (1) that B* is a contraction mapping with contraction factory, and (2) the triangle inequality: for X: SR,Y: SR, || X + Y|| ||X|| + ||Y||o.

Step by Step Solution

3.43 Rating (159 Votes )

There are 3 Steps involved in it

Step: 1

The detailed ... blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Introductory Statistics

Authors: Prem S. Mann

8th Edition

9781118473986, 470904100, 1118473981, 978-0470904107

More Books

Students also viewed these Computer Engineering questions