Question: 4. For the following simplified grid world, assuming that each on-grid transition leads to a reward of -1, all off-grid transitions lead to a
4. For the following simplified grid world, assuming that each on-grid transition leads to a reward of -1, all off-grid transitions lead to a reward of - 10 with no state change, and discount factor of 1, calculate and compare the state values using Bellman equation for following policies: a. (a|s) = 0.25, for a = left, right, up, and down; b. (as) = 0.5, for a = left and up; (as) = 0, for a = right and down. 4 8 1 LO 2 6 9 10 4. For the following simplified grid world, assuming that each on-grid transition leads to a reward of -1, all off-grid transitions lead to a reward of - 10 with no state change, and discount factor of 1, calculate and compare the state values using Bellman equation for following policies: a. (a|s) = 0.25, for a = left, right, up, and down; b. (as) = 0.5, for a = left and up; (as) = 0, for a = right and down. 4 8 1 LO 2 6 9 10 4. For the following simplified grid world, assuming that each on-grid transition leads to a reward of -1, all off-grid transitions lead to a reward of - 10 with no state change, and discount factor of 1, calculate and compare the state values using Bellman equation for following policies: a. (a|s) = 0.25, for a = left, right, up, and down; b. (as) = 0.5, for a = left and up; (as) = 0, for a = right and down. 4 8 1 LO 2 6 9 10
Step by Step Solution
3.42 Rating (152 Votes )
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
