Exercise 9.17 Consider a grid world where the action up has the following dynamics: That is, it

Question:

Exercise 9.17 Consider a grid world where the action “up” has the following dynamics:

That is, it goes up with probability 0.8, up-left with probability 0.1, and up-right with probability 0.1. Suppose we have the following states:
s12 s13 s14 s17 s18 s19 There is a reward of +10 upon entering state s14, anda reward of −5 upon entering state s19. All other rewards are 0.
The discount is 0.9.

Suppose we are doing asynchronous value iteration, storing Q[S,A], and we have the following values for these states:
V(s12) = 5 V(s13) = 7 V(s14) = −3 V(s17) = 2 V(s18) = 4 V(s19) = −6 Suppose, in the next step of asynchronous value iteration, we select state s18 and action up. What is the resulting updated value for Q[s18, up]? Give the numerical formula, but do not evaluate or simplify it.

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question
Question Posted: