For the following reinforcement learning algorithms: (i) Q-learning with fixed and 80% exploitation. (ii) Q-learning with

Question:

For the following reinforcement learning algorithms:

(i) Q-learning with fixed α and 80% exploitation.
(ii) Q-learning with fixed αk = 1/k and 80% exploitation.
(iii) Q-learning with αk = 1/k and 100% exploitation.
(iv) SARSA learning with αk = 1/k and 80% exploitation.
(v) SARSA learning with αk = 1/k and 100% exploitation.
(vi) Feature-based SARSA learning with softmax action selection.
(vii) A model-based reinforcement learner with 50% exploitation.

(a) Which of the reinforcement learning algorithms will find the optimal policy, given enough time?

(b) Which ones will actually follow the optimal policy?

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question
Question Posted: