Question: 1 Consider the 3 3 world shown in Figure 14(a). The transition model is the same as in the 43 Figure 1: 80% of

1 Consider the 3 × 3 world shown in Figure 14(a). The transition model is the same as in the 4×3 Figure 1: 80% of the time the agent goes in the direction it selects; the rest of the time it moves at right angles to the intended direction.

Implement value iteration for this world for each value of r below. Use discounted rewards with a discount factor of 0.99. Show the policy obtained in each case. Explain intuitively why the value of r leads to each policy.

a. r = 100

b. r = −3

c. r = 0

d. r = +3

r -1 +10 +50 -1 -1 -1 -1 -1 -1 -1 -1

r -1 +10 +50 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Start -1 -1 -1 -50 +1 +1 +1 +1 +1 +1 +1 (a) (b) Figure 14 (a) 3 x 3 world for Exercise 8. The reward for each state is indicated. The upper right square is a terminal state. (b) 101 x 3 world for Exercise 9 (omitting 93 identical columns in the middle). The start state has reward 0.

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Artificial Intelligence Modern Questions!