Question
The owner of a race horse wants to maximize the infinite horizon dis counted returns of his horse. The discount factor is 2/3. It is
The owner of a race horse wants to maximize the infinite horizon dis counted returns of his horse. The discount factoris 2/3. It is possible to participate in a race every day, but after participating the horse may not be fit next day. If the horse is fit, the expected return for that day is $200,000. If the horse is tired, the expected return is only $100,0000. Participation in a race is for free. If the horse is fit and participates in a race, it is fit the next day with probability 2/3 and with probability 1/3 it is tired the next day. If the horse is fit and does not participate in a race, it will still be fit the next day. Similarly, the horse will be tired the next day, if it participates in a race while being tired. If a tired horse rests for a day, it will be fit the next day with probability 1/2 and it is still tired the next day with probability 1/2.
a. Formulate this problem as a Markov decision process problem. Describe the state and action spaces and give the transition probabilities and rewards.
b. Compute the optimal policy that maximizes the infinite discounted reward using policy iteration.
c. Formulate the primal and dual LPs and provide the optimal solution of the dual LP.
d. Consider the problem of question 1 but in this problem we will focus on long-run average reward optimality. Compute the long-run average reward under each policy?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started