6. The model-based reinforcement learner allows for a different form of optimism in the face of uncertainty.

Question:

6. The model-based reinforcement learner allows for a different form of optimism in the face of uncertainty. The algorithm can be started with each state having a transition to a “nirvana” state, which has very high Q-value (but which will never be reached in practice, and so the probability will shrink to zero).

(a) Does this perform differently than initialing all Q-values to a high value? Does it work better, worse or the same?

(b) How high does the Q-value for the nirvana state need to be to work most effectively?

Suggest a reason why one value might be good, and test it.

(c) Could this method be used for the other RL algorithms? Explain how or why not.

Fantastic news! We've Found the answer you've been seeking!