14. Consider the MDP of Example 9.29. (a) As the discount varies between 0 and 1, how...

Question:

14. Consider the MDP of Example 9.29.

(a) As the discount varies between 0 and 1, how does the optimal policy change? Give an example of a discount that produces each different policy that can be obtained by varying the discount.

(b) How can the MDP and/or discount be changed so that the optimal policy is to relax when healthy and to party when sick? Give an MDP that changes as few of the probabilities, rewards or discount as possible to have this as the optimal policy.

(c) The optimal policy computed in Example 9.31 was to party when healthy and relax when sick. What is the distribution of states that the agent following this policy will visit? Hint: The policy induces a Markov chain, which has a stationary distribution. What is the average reward of this policy? Hint: The average reward can be obtained by computing the expected value of the immediate rewards with respect the stationary distribution.

Fantastic news! We've Found the answer you've been seeking!