Consider an infinite horizon Markov decision problem with the usual notation, and let (V) be a function

Question:

Consider an infinite horizon Markov decision problem with the usual notation, and let \(V\) be a function on the state space satisfying the linear programming problem (8). Adjoin to \(E\) an absorbing state \(\Delta\) of no reward, and adjoin to the action space an action \(a(\Delta)\) such that \(r(i, a(\Delta))=V(i)\) and \(T(i, \Delta ; a(\Delta))=1\). In other words, under action \(a(\Delta)\) the chain proceeds immediately to absorbing state \(\Delta\), entailing a reward \(V(i)\) for that move but no reward thereafter.

(a) For the new problem, use (8) to show that the optimal policy is to take action \(a(\Delta)\) immediately.

(b) Deduce from

(a) that \(V\) exceeds the optimal value function \(W\). Conclude that \(W\) satisfies the LP problem (8).

Fantastic news! We've Found the answer you've been seeking!