For the following reinforcement learning algorithms: (i) Q-learning with fixed and 80% exploitation. (ii) Q-learning with
Question:
For the following reinforcement learning algorithms:
(i) Q-learning with fixed α and 80% exploitation.
(ii) Q-learning with fixed αk = 1/k and 80% exploitation.
(iii) Q-learning with αk = 1/k and 100% exploitation.
(iv) SARSA learning with αk = 1/k and 80% exploitation.
(v) SARSA learning with αk = 1/k and 100% exploitation.
(vi) Feature-based SARSA learning with softmax action selection.
(vii) A model-based reinforcement learner with 50% exploitation.
(a) Which of the reinforcement learning algorithms will find the optimal policy, given enough time?
(b) Which ones will actually follow the optimal policy?
Fantastic news! We've Found the answer you've been seeking!
Step by Step Answer:
Related Book For
Artificial Intelligence: Foundations Of Computational Agents
ISBN: 9781009258197
3rd Edition
Authors: David L. Poole , Alan K. Mackworth
Question Posted: