In a Markov decision problem, another criterion often used, different than the expected average return per unit
Question:
In a Markov decision problem, another criterion often used, different than the expected average return per unit time, is that of the expected discounted return.
In this criterion we choose a number α, 0 <α< 1, and try to choose a policy so as to maximize E[
∞
i=0αi R(Xi, ai)] (that is, rewards at time n are discounted at rate αn). Suppose that the initial state is chosen according to the probabilities bi .
That is,
Fantastic news! We've Found the answer you've been seeking!
Step by Step Answer:
Related Book For
Question Posted: