In a Markov decision problem, another criterion often used, different than the expected average return per unit

Question:

In a Markov decision problem, another criterion often used, different than the expected average return per unit time, is that of the expected discounted return.

In this criterion we choose a number α, 0 <α< 1, and try to choose a policy so as to maximize E[

∞

i=0αi R(Xi, ai)] (that is, rewards at time n are discounted at rate αn). Suppose that the initial state is chosen according to the probabilities bi .

That is,

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question
Question Posted: