Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Problem 3 . ( 5 0 pt ) Consider an infinite horizon MDP , characterized by M = ( : S , A , r

Problem 3.(50pt) Consider an infinite horizon MDP, characterized by M=(:S,A,r,p,:)
and r:SA[0,1]. We would like to evaluate the value of a Markov stationary policy
:S(A). However, we do not know the transition kernel p. Rather than applying
a model-free approach, we decided to use a model-based approach where we first estimate
the underlying transition kernel by follow some fully stochastic policy in the MDP (for good
exploration) and observe the triples (sk,ak,sk+1)inSAS for k=0,1,dots Let hat(p) be our
estimate of p based on the data collected. Now, we can apply value iteration directly as if the
underlying MDP is widehat(M)=(:S,A,r,widehat(p),:) and obtain widehat(v).
Prove the simulation lemma bounding the difference between hat(v) and the true value of the
policy, denoted by v, by showing that
|v(s0)-widehat(v)(s0)|(1-)2Esds0,a(s)||widehat(p)(*|s,a)-p(*|s,a)||1,
where s0 is the initial state and ds0 is the discounted state visitation distribution under policy
. Note that the difference |v(s0)-widehat(v)(s0)| gets smaller with the smaller model approximation
error ||widehat(p)(*|s,a)-p(*|s,a)||1. However, the impact of model approximation error gets larger
with ~~1 as the approximation error propagates more across stages.
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Processing Fundamentals Design And Implementation

Authors: David M. Kroenke

5th Edition

B000CSIH5A, 978-0023668814

More Books

Students also viewed these Databases questions

Question

Can workers be trained in ethics? How? Defend your answer.

Answered: 1 week ago