Exercise 9.17 Consider a grid world where the action up has the following dynamics: That is, it
Question:
Exercise 9.17 Consider a grid world where the action “up” has the following dynamics:
That is, it goes up with probability 0.8, up-left with probability 0.1, and up-right with probability 0.1. Suppose we have the following states:
s12 s13 s14 s17 s18 s19 There is a reward of +10 upon entering state s14, anda reward of −5 upon entering state s19. All other rewards are 0.
The discount is 0.9.
Suppose we are doing asynchronous value iteration, storing Q[S,A], and we have the following values for these states:
V(s12) = 5 V(s13) = 7 V(s14) = −3 V(s17) = 2 V(s18) = 4 V(s19) = −6 Suppose, in the next step of asynchronous value iteration, we select state s18 and action up. What is the resulting updated value for Q[s18, up]? Give the numerical formula, but do not evaluate or simplify it.
Step by Step Answer:
Artificial Intelligence Foundations Of Computational Agents
ISBN: 9780521519007
1st Edition
Authors: David L. Poole, Alan K. Mackworth