The model-based reinforcement learner allows for a different form of optimism in the face of uncertainty. The

Question:

The model-based reinforcement learner allows for a different form of optimism in the face of uncertainty. The algorithm can be started with each state having a transition to a “nirvana” state, which has very high Q-value (but which will never be reached in practice, and so the probability will shrink to zero).

(a) Does this perform differently than initializing all Q-values to a high value?

Does it work better, worse, or the same?

(b) How high does the Q-value for the nirvana state need to be to work most effectively? Suggest a reason why one value might be good, and test it.

(c) Could this method be used for the other RL algorithms? Explain how or why not.

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question
Question Posted: