Consider four different ways to derive the value of k from k in Qlearning (note that for
Question:
Consider four different ways to derive the value of αk from k in Qlearning (note that for Q-learning with varying αk, there must be a different count k for each state–action pair).
(i) Let αk = 1/k.
(ii) Let αk = 10/(9 + k).
(iii) Let αk = 0.1.
(iv) Let αk = 0.1 for the first 10,000 steps, αk = 0.01 for the next 10,000 steps,
αk = 0.001 for the next 10,000 steps, αk = 0.0001 for the next 10,000 steps, and so on.
(a) Which of these will converge to the true Q-value in theory?
(b) Which converges to the true Q-value in practice (i.e., in a reasonable number of steps)? Try it for more than one domain.
(c) Which are able to adapt if the environment changes slowly?
Fantastic news! We've Found the answer you've been seeking!
Step by Step Answer:
Related Book For
Artificial Intelligence: Foundations Of Computational Agents
ISBN: 9781009258197
3rd Edition
Authors: David L. Poole , Alan K. Mackworth
Question Posted: