5 Consider four different ways to derive the value of k from k in Q learning (note that for Qlearning with varying k, there must be a different count k for each stateaction pair) (a) Let k 1 k (b) Let k 10 (9 k) (c) Let k 0 1 (d) Let k 0 1 for the first 10,000 steps, k 0 01 for the next 10,000 step...

The Answer is in the image, click to view ...

5. Consider four different ways to derive the value of k from k in Q-learning (note that...

Question:

5. Consider four different ways to derive the value of αk from k in Q-learning (note that for Qlearning with varying αk, there must be a different count k for each state–action pair).

(a) Let αk = 1/k.

(b) Let αk = 10/(9 + k).

(c) Let αk = 0.1.

(d) Let αk = 0.1 for the first 10,000 steps, αk = 0.01 for the next 10,000 steps, αk = 0.001 for the next 10,000 steps, αk = 0.0001 for the next 10,000 steps, and so on.

(a) Which of these will converge to the true Q-value in theory?

(b) Which converges to the true Q-value in practice (i.e., in a reasonable number of steps)? Try it for more than one domain.

(c) Which are able to adapt if the environment changes slowly?

Fantastic news! We've Found the answer you've been seeking!