Exercise 11.9 Consider four different ways to derive the value of k from k in Qlearning (note

Question:

Exercise 11.9 Consider four different ways to derive the value of αk from k in Qlearning

(note that for Q-learning with varying αk, there must be a different count k for each state–action pair).

i) Let αk = 1/k.

ii) Let αk = 10/(9 + k).

iii) Let αk = 0.1.

iv) Let αk = 0.1 for the first 10,000 steps, αk = 0.01 for the next 10,000 steps,

αk = 0.001 for the next 10,000 steps, αk = 0.0001 for the next 10,000 steps, and so on.

(a) Which of these will converge to the true Q-value in theory?

(b) Which converges to the true Q-value in practice (i.e., in a reasonable number of steps)? Try it for more than one domain.

(c) Which can adapt when the environment adapts slowly?

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question
Question Posted: