77. In a Markov decision problem, another criterion often used, different than the expected average return per

Question:

77. In a Markov decision problem, another criterion often used, different than the expected average return per unit time, is that of the expected discounted return. In this criterion we choose a number α, 0 <α< 1, and try to choose a policy so as to maximize E[

i=0αi R(Xi,ai)] (that is, rewards at time n are discounted at rate αn). Suppose that the initial state is chosen according to the probabilities bi. That is,

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question
Question Posted: