Exercise 11.8 Compare the different parameter settings for the game of Example 11.8 (page 464). In particular

Question:

Exercise 11.8 Compare the different parameter settings for the game of Example 11.8 (page 464). In particular compare the following situations:

(a) α varies, and the Q-values are initialized to 0.0.

(b) α varies, and the Q-values are initialized to 5.0.

(c) α is fixed to 0.1, and the Q-values are initialized to 0.0.

(d) α is fixed to 0.1, and the Q-values are initialized to 5.0.

(e) Some other parameter settings.

For each of these, carry out multiple runs and compare the distributions of minimum values, zero crossing, the asymptotic slope for the policy that includes exploration, and the asymptotic slope for the policy that does not include exploration.

To do the last task, after the algorithm has converged, set the exploitation parameter to 100% and run a large number of additional steps.

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question
Question Posted: