1 Select a specific member of the set of policies that are optimal for R(s) > 0...

Question:

1 Select a specific member of the set of policies that are optimal for R(s) > 0 as shown in Figure 2(b), and calculate the fraction of time the agent spends in each state, in the limit, if the policy is executed forever. (Hint: Construct the state-to-state transition probability matrix corresponding to the policy)

Fantastic news! We've Found the answer you've been seeking!