The model-based reinforcement learner allows for a different form of optimism in the face of uncertainty. The
Question:
The model-based reinforcement learner allows for a different form of optimism in the face of uncertainty. The algorithm can be started with each state having a transition to a “nirvana” state, which has very high Q-value (but which will never be reached in practice, and so the probability will shrink to zero).
(a) Does this perform differently than initializing all Q-values to a high value?
Does it work better, worse, or the same?
(b) How high does the Q-value for the nirvana state need to be to work most effectively? Suggest a reason why one value might be good, and test it.
(c) Could this method be used for the other RL algorithms? Explain how or why not.
Step by Step Answer:
Artificial Intelligence: Foundations Of Computational Agents
ISBN: 9781009258197
3rd Edition
Authors: David L. Poole , Alan K. Mackworth