1 Consider the 3 3 world shown in Figure 14(a). The transition model is the same...
Question:
1 Consider the 3 × 3 world shown in Figure 14(a). The transition model is the same as in the 4×3 Figure 1: 80% of the time the agent goes in the direction it selects; the rest of the time it moves at right angles to the intended direction.
Implement value iteration for this world for each value of r below. Use discounted rewards with a discount factor of 0.99. Show the policy obtained in each case. Explain intuitively why the value of r leads to each policy.
a. r = 100
b. r = −3
c. r = 0
d. r = +3
Fantastic news! We've Found the answer you've been seeking!
Step by Step Answer:
Related Book For
Question Posted: