1 A proper policy for is MDP is one that is guaranteed to reach a terminal state....

Question:

1 A proper policy for is MDP is one that is guaranteed to reach a terminal state. Show that it is possible for a passive ADP agent to learn a transition model for which its policy is improper even if is proper for the true MDP; with such models, the POLICY-EVALUATION step may fail if . Show that this problem cannot arise if POLICY-EVALUATION is appl ied to the learned model only at the end of a trial.

Fantastic news! We've Found the answer you've been seeking!

Step by Step Answer:

Related Book For  book-img-for-question
Question Posted: