In this exercise we will consider two-player MDPs that correspond to zero-sum, turntaking games like those in
Question:
In this exercise we will consider two-player MDPs that correspond to zero-sum, turntaking games like those in Chapter 6. Let the players be: A and B, and let R(s) be the reward for player A in s. (The reward for B is always equal and opposite.)
a. Let UA(s) be the utility of state s when it is A's turn to move in s, and let UB (s) be the utility of state s when it is B's turn to move in s. All rewards and utilities are calculated from A's point of view (just as in a minimax game tree). Write down Bellman r:quations defining UA( s )a nd UB( s).
b. Explain how to do two-player value iteration with these equations, and define a suitable stopping criterion.
c. Consider the game described in Figure
Step by Step Answer:
Artificial Intelligence: A Modern Approach
ISBN: 9780137903955
2nd Edition
Authors: Stuart Russell, Peter Norvig