Question: 1 Consider the 3 3 world shown in Figure 14(a). The transition model is the same as in the 43 Figure 1: 80% of
1 Consider the 3 × 3 world shown in Figure 14(a). The transition model is the same as in the 4×3 Figure 1: 80% of the time the agent goes in the direction it selects; the rest of the time it moves at right angles to the intended direction.
Implement value iteration for this world for each value of r below. Use discounted rewards with a discount factor of 0.99. Show the policy obtained in each case. Explain intuitively why the value of r leads to each policy.
a. r = 100
b. r = −3
c. r = 0
d. r = +3

r -1 +10 +50 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Start -1 -1 -1 -50 +1 +1 +1 +1 +1 +1 +1 (a) (b) Figure 14 (a) 3 x 3 world for Exercise 8. The reward for each state is indicated. The upper right square is a terminal state. (b) 101 x 3 world for Exercise 9 (omitting 93 identical columns in the middle). The start state has reward 0.
Step by Step Solution
There are 3 Steps involved in it
Get step-by-step solutions from verified subject matter experts
