In a Markov decision problem, another criterion often used, different than the expected average return per unit
Question:
In a Markov decision problem, another criterion often used, different than the expected average return per unit time, is that of the expected discounted return. In this criterion we choose a number
a, 0 < a < 1, and try to choose a policy so as to maximize E[o a'R(X,, a)]. (That is, rewards at time n are discounted at rate a".) Suppose that the initial state is chosen according to the probabilities
b. That is, P(X = i) =
b, i = 1,..., n
Fantastic news! We've Found the answer you've been seeking!
Step by Step Answer:
Related Book For
Question Posted: