Let us presume that the dynamic programming equation (6) still holds when the state and action spaces

Question:

Let us presume that the dynamic programming equation (6) still holds when the state and action spaces are not finite, for the purposes of the following problem. An owner of a baseball team can spend any proportion \(p \in[0,1]\) of his currrent assets on free agents. He estimates that the team will come through and return him twice the amount that he spent with probability \(w\), but the team will fail and he will lose what he spent with probability \(l=1-w\). The owner plans to keep the team for \(T\) years before selling out. His goal is to maximize the expected value of the logarithm of his wealth when he sells the team.

(a) Model this problem as a Markov decision problem, including a description of the state and action spaces, a formula for the transition probabilities \(T(x, y ; a)\), and the single period and terminal reward functions.

(b) Write the dynamic programming equation for the problem.

(c) If \(T=3\) and \(w>1 / 2\), show that the optimal action at each time 0,1 , and 2 is to bet a proportion \(a=2 w-1\) of the current wealth.

Fantastic news! We've Found the answer you've been seeking!