Exercise 11 9 Consider four different ways to derive the value of k from k in Qlearning (note that for Q learning with varying k, there must be a different count k for each stateaction pair) i) Let k 1 k ii) Let k 10 (9 k) iii) Let k 0 1 iv) Let k 0 1 for the first 10,000 steps, k 0 01 for the next...

The Answer is in the image, click to view ...

Exercise 11.9 Consider four different ways to derive the value of k from k in Qlearning (note

Exercise 11.9 Consider four different ways to derive the value of αk from k in Qlearning

(note that for Q-learning with varying αk, there must be a different count k for each state–action pair).

i) Let αk = 1/k.

ii) Let αk = 10/(9 + k).

iii) Let αk = 0.1.

iv) Let αk = 0.1 for the first 10,000 steps, αk = 0.01 for the next 10,000 steps,

αk = 0.001 for the next 10,000 steps, αk = 0.0001 for the next 10,000 steps, and so on.

(a) Which of these will converge to the true Q-value in theory?

(b) Which converges to the true Q-value in practice (i.e., in a reasonable number of steps)? Try it for more than one domain.

(c) Which can adapt when the environment adapts slowly?

Fantastic news! We've Found the answer you've been seeking!

Related Book For book-img-for-question

ISBN: 9780521519007

1st Edition

Authors: David L. Poole, Alan K. Mackworth

See More Books

Question Posted: Oct 12, 2024 11:00 AM

See More Questions