Consider the infinite MDP with discount factor 1 illustrated in Figure 1 It consists of 3 states, and rewards are given upon taking an action from the state From state s 0 , action a 1 has zero immediate reward and causes a deterministic transition to state s 1 where there is reward 1 for every time step afterwards ( regardless of action ) From state s 0 , action a 2 causes a deterministic transition to state s 2 with immediate reward of 2 1 but state s 2 has zero reward for every time step afterwards ( regardless of action ) Figure 1 infinite 3 state MDP ( a ) What is the total discounted return ( t 0 t r t ) of taking action a 1 from state s 0 at time step t 0 5 p t s ( b ) What is the total discounted return ( t 0 t r t ) of taking action a 2 from state s 0 at time step t 0 What is the optimal action 5 pts ( c ) Assume we initialize value of each state to zero, ( i e at iteration n 0 , AAs V n 0 ( s ) 0 ) Show that value iteration continues to choose the sub optimal action until iteration n where, n l o g ( 1 ) l o g 1 2 l o g ( 1 1 ) 1 1 Thus, value iteration has a running time that grows faster than 1 1 ( You just need to show the first inequality ) 1 0 pts

The Answer is in the image, click to view ...

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 22, 2024

Consider the infinite MDP with discount factor 1 illustrated in Figure 1 . It consists of 3 states, and rewards are given upon taking an

Consider the infinite MDP with discount factor

1

illustrated in Figure

1 .

It consists of

3

states,

and rewards are given upon taking an action from the state. From state

s_{0},

action

a_{1}

has zero

immediate reward and causes a deterministic transition to state

s_{1}

where there is reward

+ 1

for

every time step afterwards

(

regardless of action

) .

From state

s_{0},

action

a_{2}

causes a deterministic

transition to state

s_{2}

with immediate reward of

\frac{^{2}}{1 -}

but state

s_{2}

has zero reward for every

time step afterwards

(

regardless of action

) .

Figure

1

: infinite

3 -

state MDP

(

)

What is the total discounted return

(_{t = 0}^{}^{t} r_{t})

of taking action

a_{1}

from state

s_{0}

at time step

t = 0 ? [5 p t s]

(

)

What is the total discounted return

(_{t = 0}^{}^{t} r_{t})

of taking action

a_{2}

from state

s_{0}

at time step

t = 0 ?

What is the optimal action?

[5

pts

]

(

)

Assume we initialize value of each state to zero,

(

.

.

at iteration

n = 0,

AAs:

V_{n} = 0 (s) = 0) .

Show that value iteration continues to choose the sub

-

optimal action until iteration

n^{* *}

where,

n^{* *} \frac{l o g (1 -)}{l o g} \frac{1}{2} l o g (\frac{1}{1 -}) \frac{1}{1 -}

Thus, value iteration has a running time that grows faster than

\frac{1}{1 -} . (

You just need to

show the first inequality

) [10

pts

]

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Oracle Database 12c Dba Handbook Manage A Scalable Secure Oracle Enterprise Database Environment

Authors: Bob Bryla

1st Edition

0071798781, 978-0071798785

More Books

Students also viewed these Databases questions

Question

★★★★★

Determine the critical value for a right-tailed test regarding a population proportion at the = 0.01 level of signicance.

Answered: 1 week ago

Question

★★★★★

Cost Description Wages of carpenters on a home building site Cost of wiring used in making a personal computer Manager's salary at a hotel run by a chain of hotels Manager's salary at a hotel run by...

Answered: 1 week ago

Question

★★★★★

Elephants appear to have the capacity to remember large-scale spaces over long periods. Which of the following best identifies this capacity? a. Latent learning b. Insight c. Cognitive maps d....

Answered: 1 week ago

Question

★★★★★

On January 15, the end of the first biweekly pay period of the year, North Companys payroll register showed that its employees earned $ 35,000 of sales salaries. Withholdings from the employees...

Answered: 1 week ago

Question

★★★★★

Consider the infinite MDP with discount factor 1 illustrated in Figure 1 . It consists of 3 states, and rewards are given upon taking an action from the state. From state s 0 , action a 1 has zero...

Answered: 1 week ago

Question

★★★★★

Use this information for Chicks Division to answer the question that follow. Chicks Division had $1,100,000 in invested assets, sales of $1,210,000, income from operations of $302,500, and a minimum...

Answered: 1 week ago

Question

★★★★★

A B C 1) An object moves along the line from point 1 to point 2 in time t. 2 a) Suppose that the object is speeding up. Which of the labeled points A, B, or C could correspond to the location of the...

Answered: 1 week ago

Question

★★★★★

Harry Hype has $5,000 to spend on advertising a new kind of dehydrated sushi. Market research shows that the people most likely to buy this new product are recent recipients of M.B.A. degrees and...

Answered: 1 week ago

Question

★★★★★

1. A source of yellow light ( = 570 nm) produces interference through two narrow slits separated by a distance of 0.01cm. A screen is placed 3m away. a. How far from the central max is the fifth...

Answered: 1 week ago

Question

★★★★★

You are revising your company's talent acquisition strategy to make it more competitive and appealing to potential candidates. Part of your strategy involves clearly presenting the compensation...

Answered: 1 week ago

Question

★★★★★

Inventory Land Book Value Fair Value $ 630,000 $ 600,000 750,000 990,000 1,700,000 2,000,000 Buildings Customer relationships Accounts payable 0 (80,000) Common stock (2,000,000) Additional paid-in...

Answered: 1 week ago

Question

★★★★★

How do preprinted forms or standardized formats help (a) report writers and (b) report readers? (Objective 2)

Answered: 1 week ago

Question

★★★★★

What is included in the introduction to a formal written report? (Objective 4)

Answered: 1 week ago

Question

★★★★★

If a report has a title page, its a formal report. Do you agree or disagree with this statement? Why? (Objective 1 )

Answered: 1 week ago

Previous Question Next Question