Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 10, 2024

Problem 3 . ( 5 0 pt ) Consider an infinite horizon MDP , characterized by M = ( : S , A , r

Problem

3 . (50

)

Consider an infinite horizon MDP

,

characterized by

M = (

S, A, r, p,

)

and

r

S A [0, 1] .

We would like to evaluate the value of a Markov stationary policy

S (A) .

However, we do not know the transition kernel

p .

Rather than applying

a model

-

free approach, we decided to use a model

-

based approach where we first estimate

the underlying transition kernel by follow some fully stochastic policy in the MDP

(

for good

exploration

)

and observe the triples

(s_{k}, a_{k}, s_{k + 1}) i n S A S

for

k = 0, 1,

dots Let hat

(p)

be our

estimate of

p

based on the data collected. Now, we can apply value iteration directly as if the

underlying MDP is widehat

(M) = (

S, A, r,

widehat

(p),

)

and obtain widehat

(v)^{} .

Prove the simulation lemma bounding the difference between hat

(v)^{}

and the true value of the

policy, denoted by

v^{},

by showing that

| v^{} (s_{0}) -

widehat

(v)^{} (s_{0}) | \frac{}{(1 -)^{2}} E_{s d_{s_{0}}^{}, a (s)} | | w i d e h a t (p) (* | s, a) - p (* | s, a) | |_{1},

where

s_{0}

is the initial state and

d_{s_{0}}^{}

is the discounted state visitation distribution under policy

.

Note that the difference

| v^{} (s_{0}) -

widehat

(v)^{} (s_{0}) |

gets smaller with the smaller model approximation

error

| | w i d e h a t (p) (* | s, a) - p (* | s, a) | |_{1} .

However, the impact of model approximation error gets larger

with

1

as the approximation error propagates more across stages.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Processing Fundamentals Design And Implementation

Authors: David M. Kroenke

5th Edition

B000CSIH5A, 978-0023668814

More Books

Students also viewed these Databases questions

Question

★★★★★

Thule Drilling sued Jacob Schimberg based on business transactions between Thule and QGM Group, a corporation for which Schimberg was the CEO. Thule's contract with QGM required QGM to do repair and...

Answered: 1 week ago

Question

★★★★★

4 Not yet answered Marked out of 1.00 Flag question Sam Enterprise reported the following for March 2022: Direct Material Costs Direct Labour Costs Factory Overhead Cost of Goods Manufactured Work in...

Answered: 1 week ago

Question

★★★★★

=+ (a) Show that the nonnegligible set N of normal numbers is of the first category by proving that Am = "-,[w: In 's ,, (w)| Answered: 1 week ago

Answered: 1 week ago

Question

★★★★★

Why might an analyst examining variances in the production area look beyond that business function for explanations of those variances?

Answered: 1 week ago

Question

★★★★★

Walsh Company is considering three independent projects, each of which requires a $4 million investment. The estimated internal rate of return (IRR) and cost of capital for these projects are...

Answered: 1 week ago

Question

★★★★★

Define the concept of Admitted Assets in Statutory Accounting and differentiate between the accounting for assets under GAAP.

Answered: 1 week ago

Question

★★★★★

A little girl playing with a doll accidentally drops it while playing. She picks up her doll and gives her a hug and says she is sorry to the doll. When the girl's mother asks why she did that, the...

Answered: 1 week ago

Question

★★★★★

MAKE A DECISION: What do you think is the best course of action for Kate? Kate is not ready for discharge nor is she ready to be responsible for her son. She should spend at least 30 more days...

Answered: 1 week ago

Question

★★★★★

Based on his experiment, was Dr. Moesteller's conclusion correct? a. No, because he did not randomly select his subjects. b. No, because he knew some of his subjects better than others. c. Yes,...

Answered: 1 week ago

Question

★★★★★

You are a high school counselor. Maria is a 10th grade student who has just moved to the U.S. from Spain because her parents passed away. She is living with her cousin, who is currently on free and...

Answered: 1 week ago

Question

★★★★★

Reflection Questions Please review the assignment instructions and then respond to the following questions. This section must be completed before proceeding with the assignment. 1. Based on what you...

Answered: 1 week ago

Question

★★★★★

What is a labor union? Do you agree or disagree with its purpose? Defend your answer.

Answered: 1 week ago

Question

★★★★★

In your own words, define motivation. Discuss three ways managers can motivate employees.

Answered: 1 week ago

Question

★★★★★

Can workers be trained in ethics? How? Defend your answer.

Answered: 1 week ago

Previous Question Next Question