In this exercise we will consider two-player MDPs that correspond to zero-sum, turntaking games like those in

Question:

In this exercise we will consider two-player MDPs that correspond to zero-sum, turntaking games like those in Chapter 6. Let the players be: A and B, and let R(s) be the reward for player A in s. (The reward for B is always equal and opposite.)

a. Let UA(s) be the utility of state s when it is A's turn to move in s, and let UB (s) be the utility of state s when it is B's turn to move in s. All rewards and utilities are calculated from A's point of view (just as in a minimax game tree). Write down Bellman r:quations defining UA( s )a nd UB( s).

b. Explain how to do two-player value iteration with these equations, and define a suitable stopping criterion.

c. Consider the game described in Figure

Fantastic news! We've Found the answer you've been seeking!