Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Problem 3. (50pt) Consider an infinite horizon MDP, characterized by M = (S, A, r,p,y) and r : S A [0,1]. We would like

image text in transcribed

Problem 3. (50pt) Consider an infinite horizon MDP, characterized by M = (S, A, r,p,y) and r : S A [0,1]. We would like to evaluate the value of a Markov stationary policy : S A(A). However, we do not know the transition kernel p. Rather than applying a model-free approach, we decided to use a model-based approach where we first estimate the underlying transition kernel by follow some fully stochastic policy in the MDP (for good exploration) and observe the triples (Sk, ak, Sk+1) S A S for k 0,1,.... Let be our estimate of p based on the data collected. Now, we can apply value iteration directly as if the underlying MDP is M = (S, A, r,p,y) and obtain T. = Prove the simulation lemma bounding the difference between and the true value of the policy, denoted by v", by showing that v (so) (so)| < (1) Es~d~(s) ||( | s, a) p( | s, a)||1, where so is the initial state and do is the discounted state visitation distribution under policy . Note that the difference |vT (80) (so)| gets smaller with the smaller model approximation error ||( | s, a) p( | s, a)||1. However, the impact of model approximation error gets larger with 1 as the approximation error propagates more across stages.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Applied Linear Algebra

Authors: Peter J. Olver, Cheri Shakiban

1st edition

131473824, 978-0131473829

More Books

Students also viewed these Mathematics questions

Question

1. Walk to the child, look into his or her eyes.

Answered: 1 week ago

Question

Could this situation be avoided? If no, why? If yes, how?

Answered: 1 week ago