Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 07, 2024

Problem 3 . ( 5 0 pt ) Consider an infinite horizon MDP , characterized by = ( : , , , , : )

Problem

3 .

(

50

)

Consider an infinite horizon MDP

,

characterized by

=

(

,

,

,

,

)

and

\

times

- >

[

0

,

1

]

.

We would like to evaluate the value of a Markov stationary policy

- >

(

)

.

However

,

we do not know the transition kernel

.

Rather than applying

a model

-

free approach, we decided to use a model

-

based approach where we first estimate

the underlying transition kernel by follow some fully stochastic policy in the MDP

(

for good

exploration

)

and observe the triples

(

,

,

+

1

)

\

times

\

times

for

=

0

,

1

,

dots. Let widehat

(

)

be our

estimate of

based on the data collected. Now, we can apply value iteration directly as if the

underlying MDP is widehat

(

)

=

(

,

,

,

widehat

(

)

,

)

and obtain widehat

(

)

.

Prove the simulation lemma bounding the difference between hat

(

)

and the true value of the

policy, denoted by

,

by showing that

|

(

0

)

-

widehat

(

)

(

0

)

|

< =

(

1

-

)

2

0

,

(

)

|

|

(

)

(

*

|

,

)

-

(

*

|

,

)

|

|

1

,

where

0

is the initial state and

0

is the discounted state visitation distribution under policy

.

Note that the difference

|

(

0

)

-

widehat

(

)

(

0

)

|

gets smaller with the smaller model approximation

error

|

|

(

)

(

*

|

,

)

-

(

*

|

,

)

|

|

1

.

However

,

the impact of model approximation error gets larger

with

1

as the approximation error propagates more across stages.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Fundamentals Of Database Systems

Authors: Ramez Elmasri, Sham Navathe

4th Edition

0321122267, 978-0321122261

More Books

Students also viewed these Databases questions

Question

★★★★★

Statement of Net Cost. The Rural Assistance Agency operates three major programs as responsibility centersthe Food Bank. Housing Services and Credit Counseling, Clients pay a fee for services on a...

Answered: 1 week ago

Question

★★★★★

What recent changes have caused supply chain management to gain importance?

Answered: 1 week ago

Question

★★★★★

1. One simple way to define a delusion would be to say that it is a false belief. But there is more to it than that. How would you describe Bills delusional belief about the film that had presumably...

Answered: 1 week ago

Question

★★★★★

Kathy, age 29, is married and has a son, age 3. She owns a $100,000 ordinary life insurance policy that contains a waiver-of-premium provision, guaranteed purchase option, and accelerated benefits...

Answered: 1 week ago

Question

★★★★★

Santana Rey expects sales of Business Solutions's line of computer workstation furniture to equal 300 workstations (at a sales price of $3,800 each) for 2019. The workstations' manufacturing costs...

Answered: 1 week ago

Question

★★★★★

Case 11 Strategies and Cost-Benefit Analysis for Kicking Off an Abandoned Project Accounting information systems Management accounting Cost-benefit analysis Project management Process, systems and...

Answered: 1 week ago

Question

★★★★★

What corrections would need to be made to the legal citation Doniphan v. Restler, 489 Conn. Cir. 522 (1983)?

Answered: 1 week ago

Question

★★★★★

Research the "Span of Control" theory of Vytautas Andrius Graiciunas (1898-1952) a Lithuanian-French management consultant, management theorist, and engineer. and answer the following three questions...

Answered: 1 week ago

Question

★★★★★

How should you go about assigning a date to each task? a. Work backward from your project's deadline. b. Projects to be completed in two weeks or less don't usually need dates. c. Only assign dates...

Answered: 1 week ago

Question

★★★★★

A three colour spinner was spun 90 times. The outcomes are summarized in the table below: What is the experimental probability of spinning Red? Colour Frequency Blue 5n Black 30n Red 10n

Answered: 1 week ago

Question

★★★★★

Abdulah is doing a research study on anxiety and eating. He has done the same study for the last 10 years and the results are consistently the same each time. Abdulah's research study and methods...

Answered: 1 week ago

Question

★★★★★

Why do mergers and acquisitions have such an impact on employees?

Answered: 1 week ago

Question

★★★★★

What is strategic management? How does that apply to the healthcare industry? Be specific.

Answered: 1 week ago

Question

★★★★★

2. Describe the functions of communication

Answered: 1 week ago

Previous Question Next Question