For the two-state, two-action Markov decision process with transition matrices and per period reward function as below,
Question:
For the two-state, two-action Markov decision process with transition matrices and per period reward function as below, consider the finite horizon problem with time horizon \(T=4\) and terminal reward \(R(1)=2, R(2)=1\). Find the optimal policy.
Fantastic news! We've Found the answer you've been seeking!
Step by Step Answer:
Related Book For
Introduction To The Mathematics Of Operations Research With Mathematica
ISBN: 9781574446128
1st Edition
Authors: Kevin J Hastings
Question Posted: