14. Consider the MDP of Example 9.29. (a) As the discount varies between 0 and 1, how...
Question:
14. Consider the MDP of Example 9.29.
(a) As the discount varies between 0 and 1, how does the optimal policy change? Give an example of a discount that produces each different policy that can be obtained by varying the discount.
(b) How can the MDP and/or discount be changed so that the optimal policy is to relax when healthy and to party when sick? Give an MDP that changes as few of the probabilities, rewards or discount as possible to have this as the optimal policy.
(c) The optimal policy computed in Example 9.31 was to party when healthy and relax when sick. What is the distribution of states that the agent following this policy will visit? Hint: The policy induces a Markov chain, which has a stationary distribution. What is the average reward of this policy? Hint: The average reward can be obtained by computing the expected value of the immediate rewards with respect the stationary distribution.
Step by Step Answer:
Artificial Intelligence Foundations Of Computational Agents
ISBN: 9781107195394
2nd Edition
Authors: David L. Poole, Alan K. Mackworth