Exercise 11.9 Consider four different ways to derive the value of k from k in Qlearning (note
Question:
Exercise 11.9 Consider four different ways to derive the value of αk from k in Qlearning
(note that for Q-learning with varying αk, there must be a different count k for each state–action pair).
i) Let αk = 1/k.
ii) Let αk = 10/(9 + k).
iii) Let αk = 0.1.
iv) Let αk = 0.1 for the first 10,000 steps, αk = 0.01 for the next 10,000 steps,
αk = 0.001 for the next 10,000 steps, αk = 0.0001 for the next 10,000 steps, and so on.
(a) Which of these will converge to the true Q-value in theory?
(b) Which converges to the true Q-value in practice (i.e., in a reasonable number of steps)? Try it for more than one domain.
(c) Which can adapt when the environment adapts slowly?
Step by Step Answer:
Artificial Intelligence Foundations Of Computational Agents
ISBN: 9780521519007
1st Edition
Authors: David L. Poole, Alan K. Mackworth