Question: Consider the following alternative ways to update the probability P in the stochastic policy iteration algorithm of Figure 14.10 (page 634). (i) Make more recent

Consider the following alternative ways to update the probability P in the stochastic policy iteration algorithm of Figure 14.10 (page 634).

(i) Make more recent experiences have more weight by multiplying the counts in P by (1 − β), for small β (such as 0.01), before adding 1 to the best action.

(ii) Add some small value, (such as 0.01 or 0.001), to the probability of the best action and subtract values from the other actions to make sure the probabilities are non-negative and sum to 1.

(a) Which of the original, (i), or (ii) has the best payoffs for the game of Example 14.15 (page 627), where there is a unique Nash equilibrium but another strategy profile has a better payoff for both agents?

(b) Which one has the best payoffs in the penalty kick game of Example 14.9

(page 621) when played against the others?

(c) Which of the original and the alternatives, if any, converge to a Nash equilibrium in the strategies played (averaging over all of the actions played)? (Do this experimentally, creating hypotheses from your observations, and then try to prove your hypotheses.)

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer
Step: 1 Unlock blur-text-image
Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock
Step: 3 Unlock

Students Have Also Explored These Related Management And Artificial Intelligence Questions!