Consider an infinite horizon Markov decision problem with the usual notation, and let (V) be a function
Question:
Consider an infinite horizon Markov decision problem with the usual notation, and let \(V\) be a function on the state space satisfying the linear programming problem (8). Adjoin to \(E\) an absorbing state \(\Delta\) of no reward, and adjoin to the action space an action \(a(\Delta)\) such that \(r(i, a(\Delta))=V(i)\) and \(T(i, \Delta ; a(\Delta))=1\). In other words, under action \(a(\Delta)\) the chain proceeds immediately to absorbing state \(\Delta\), entailing a reward \(V(i)\) for that move but no reward thereafter.
(a) For the new problem, use (8) to show that the optimal policy is to take action \(a(\Delta)\) immediately.
(b) Deduce from
(a) that \(V\) exceeds the optimal value function \(W\). Conclude that \(W\) satisfies the LP problem (8).
Step by Step Answer:
Introduction To The Mathematics Of Operations Research With Mathematica
ISBN: 9781574446128
1st Edition
Authors: Kevin J Hastings