Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Problem 3 . ( 5 0 pt ) Consider an infinite horizon MDP , characterized by M = ( :S , A , r ,

Problem 3.(50pt) Consider an infinite horizon MDP, characterized by M=(:S,A,r,p,\gamma :) and r:S\times A->[0,1]. We would like to evaluate the value of a Markov stationary policy \pi :S->\Delta (A). However, we do not know the transition kernel p . Rather than applying a model-free approach, we decided to use a model-based approach where we first estimate the underlying transition kernel by follow some fully stochastic policy in the MDP (for good exploration) and observe the triples (s_(k),a_(k),s_(k+1))inS\times A\times S for k=0,1,dots . Let widehat(p) be our estimate of p based on the data collected. Now, we can apply value iteration directly as if the underlying MDP is widehat(M)=(:S,A,r,widehat(p),\gamma :) and obtain widehat(v)^(\pi ). Prove the simulation lemma bounding the difference between hat(v)^(\pi ) and the true value of the policy, denoted by v^(\pi ), by showing that |v^(\pi )(s_(0))-widehat(v)^(\pi )(s_(0))|<=(\gamma )/((1-\gamma )^(2))E_(sd_(s_(0))^(\pi ),a\pi (s))||widehat(p)(*|s,a)-p(*|s,a)||_(1), where s_(0) is the initial state and d_(s_(0))^(\pi ) is the discounted state visitation distribution under policy \pi . Note that the difference |v^(\pi )(s_(0))-widehat(v)^(\pi )(s_(0))| gets smaller with the smaller model approximation error ||widehat(p)(*|s,a)-p(*|s,a)||_(1). However, the impact of model approximation error gets larger with \gamma 1 as the approximation error propagates more across stages.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Concepts

Authors: David Kroenke

4th Edition

0136086535, 9780136086536

More Books

Students also viewed these Databases questions