Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 04, 2024

Problem 3 . ( 5 0 pt ) Consider an infinite horizon MDP , characterized by M = ( : S , A , r

Problem $3 . (50$ pt $)$ Consider an infinite horizon MDP $,$ characterized by $M = ($ : $S, A, r, p,$ : $)$

and $r$ : $S A [0, 1] .$ We would like to evaluate the value of a Markov stationary policy

: $S (A) .$ However, we do not know the transition kernel $p .$ Rather than applying

a model $-$ free approach, we decided to use a model $-$ based approach where we first estimate

the underlying transition kernel by follow some fully stochastic policy in the MDP $($ for good

exploration $)$ and observe the triples $(s_{k}, a_{k}, s_{k + 1}) i n S A S$ for $k = 0, 1,$ dots. Let hat $(p)$ be our

estimate of $p$ based on the data collected. Now, we can apply value iteration directly as if the

underlying MDP is widehat $(M) = ($ : $S, A, r,$ widehat $(p),$ : $)$ and obtain widehat $(v)^{} .$

Prove the simulation lemma bounding the difference between hat $(v)^{}$ and the true value of the

policy, denoted by $v^{},$ by showing that

$| v^{} (s_{0}) -$ widehat $(v)^{} (s_{0}) | \frac{}{(1 -)^{2}} E_{s d_{s_{0}}^{}, a (s)} | | w i d e h a t (p) (* | s, a) - p (* | s, a) | |_{1},$

where $s_{0}$ is the initial state and $d_{s_{0}}^{}$ is the discounted state visitation distribution under policy

$.$ Note that the difference $| v^{} (s_{0}) -$ widehat $(v)^{} (s_{0}) |$ gets smaller with the smaller model approximation

error $| | w i d e h a t (p) (* | s, a) - p (* | s, a) | |_{1} .$ However, the impact of model approximation error gets larger

with ~~ $1$ as the approximation error propagates more across stages.

You can search the simulation lemma online, but you need to understand and write the

proof in your own words.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions

Question

1. Patients with bipolar disorder are often reluctant to take their medication because they dont want to give up periods of hypomania, when they feel exceptionally upbeat and productive. What would...

Answered: 1 week ago

Question

★★★★★

The company has 10,000 shares of 6%, $100 par preferred stock outstanding. In addition, the company has 100,000 shares of common stock outstanding. The company started business on January 1, 2010....

Answered: 1 week ago

Question

★★★★★

An Australian firm asks the bank for a SFr/A$ quote because it received SFr and wants to change it to A$. A bank is quoting the following exchange rates against the US dollar for the Swiss franc and...

Answered: 1 week ago

Question

★★★★★

A $63,000 machine with a 7-year class life was purchased 2 years ago. The machine will now be sold for $50,000 and replaced with a new machine costing $75,000, with a 5-year class life. The new...

Answered: 1 week ago

Question

★★★★★

SECTION B-Answer any THREE of these four questions [203=60 marks] QUESTION #3 3D Rigid Body Equilibrium 2 m x 6 m 3 m 3 m D 4 m F2 (350j) N B [20 marks] F = (-800k) N Figure 2 (Courtesy of Prentice...

Answered: 1 week ago

Question

★★★★★

Please Provide your analysis and arguments Case A HaplessHospital employs Dr. Frank Aterror, a surgeon who in the past has sexually assaulted female patients. Suppose that he sees a thousand patients...

Answered: 1 week ago

Question

★★★★★

What is Sales Enablement and how it works ? How to Scale Your Sales Team Quickly? How to do t-test in Excel Simplest way of explaining please

Answered: 1 week ago

Question

★★★★★

Explain the goal of a denial-of-service (DoS) attack. How are these attacks initiated? How can they be prevented?

Answered: 1 week ago

Question

★★★★★

Comparative financial statements for Weller Corporation, a merchandising company, for the year ending December 31 appear below. The company did not issue any new common stock during the year. A total...

Answered: 1 week ago

Previous Question Next Question