77. In a Markov decision problem, another criterion often used, different than the expected average return per

Question:

77. In a Markov decision problem, another criterion often used, different than the expected average return per unit time, is that of the expected discounted return. In this criterion we choose a number α, 0 <α< 1, and try to choose a policy so as to maximize E[

∞

i=0αi R(Xi,ai)] (that is, rewards at time n are discounted at rate αn). Suppose that the initial state is chosen according to the probabilities bi. That is,

Fantastic news! We've Found the answer you've been seeking!