Suppose we run value iteration in an MDP with only non-negative rewards (that is, R(s, a, s')
Question:
Suppose we run value iteration in an MDP with only non-negative rewards (that is, R(s, a, s') ≥ 0 for any (s, a, s')). Let the values on the kth iteration be Vk(s) and the optimal values be V∗ (s). Initially, the values are 0 (that is, V0(s) = 0 for any s).
a. Mark all of the options that are guaranteed to be true.
(i) For any s, a, s' , V1(s) = R(s, a, s')
(ii) For any s, a, s' , V1(s) ≤ R(s, a, s')
(iii) For any s, a, s' , V1(s) ≥ R(s, a, s')
(iv) None of the above are guaranteed to be true.
b. Mark all of the options that are guaranteed to be true.
(i) For any k, s, Vk(s) = V∗ (s)
(ii) For any k, s, Vk(s) ≤ V∗ (s)
(iii) For any k, s, Vk(s) ≥ V∗ (s)
(iv) None of the above are guaranteed to be true.
Step by Step Answer:
Artificial Intelligence A Modern Approach
ISBN: 9780134610993
4th Edition
Authors: Stuart Russell, Peter Norvig