Exercise 9.9 Consider a game world: The robot can be at one of the 25 locations on

Question:

Exercise 9.9 Consider a game world:

The robot can be at one of the 25 locations on the grid. There can be a treasure on one of the circles at the corners. When the robot reaches the corner where the treasure is, it collects a reward of 10, and the treasure disappears. When there is no treasure, at each time step, there is a probability P1 = 0.2 that a treasure appears, and it appears with equal probability at each corner. The robot knows its position and the location of the treasure.

There are monsters at the squares marked with an X. Each monster randomly and independently, at each time step, checks if the robot is on its square. If the robot is on the square when the monster checks, it has a reward of −10 (i.e., it loses 10 points). At the center point, the monster checks at each time step with probability p2 = 0.4; at the other 4 squares marked with an X, the monsters check at each time step with probability p3 = 0.2.

Assume that the rewards are immediate upon entering a state: that is, if the robot enters a state with a monster, it gets the (negative) reward on entering the state, and if the robot enters the state with a treasure, it gets the reward upon entering the state, even if the treasure arrives at the same time.

The robot has 8 actions corresponding to the 8 neighboring squares. The diagonal moves are noisy; there is a p4 = 0.6 probability of going in the direction chosen and an equal chance of going to each of the four neighboring squares closest to the desired direction. The vertical and horizontal moves are also noisy; there is a p5 = 0.8 chance of going in the requested direction and an equal chance of going to one of the adjacent diagonal squares. For example, the actions up-left and up have the following result:

If the action would result in crashing into a wall, the robot has a reward of −2 (i.e., loses 2) and does not move.
There is a discount factor of p6 = 0.9.

(a) How many states are there? (Or how few states can you get away with?)
What do they represent?

(b) What is an optimal policy?

(c) Suppose the game designer wants to design different instances of the game that have non-obvious optimal policies for a game player. Give three assignments to the parameters p1 to p6 with different optimal policies. If there are not that many different optimal policies, give as many as there are and explain why there are no more than that.

Fantastic news! We've Found the answer you've been seeking!