1 Consider the 3 3 world shown in Figure 14(a). The transition model is the same...

Question:

1 Consider the 3 × 3 world shown in Figure 14(a). The transition model is the same as in the 4×3 Figure 1: 80% of the time the agent goes in the direction it selects; the rest of the time it moves at right angles to the intended direction.

Implement value iteration for this world for each value of r below. Use discounted rewards with a discount factor of 0.99. Show the policy obtained in each case. Explain intuitively why the value of r leads to each policy.

a. r = 100

b. r = −3

c. r = 0

d. r = +3

image text in transcribed

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question
Question Posted: