Consider a two state Markov decision process ( MDP ) with state s 1 and state s 2 In state s 1 , the decision maker chooses either action a 1 or action a 2 In state s 2 , only action a 3 is available The immediate returns and transition probabilities are as follows r ( s 1 , a 1 ) 4 , r ( s 1 , a 2 ) 1 0 , r ( s 2 , a 3 ) 2 , p ( s 1 s 1 , a 1 ) p ( s 2 s 1 , a 1 ) 0 5 , p ( s 2 s 1 , a 2 ) 1 , p ( s 1 s 2 , a 3 ) 0 2 , p ( s 2 s 2 , a 3 ) 0 8 ( a ) Solve the three periods problem with terminal reward r 4 ( s 1 ) r 4 ( s 2 ) 0 to maximize the expected total rewards and find the optimal decision rule in each period ( b ) Consider the infinite horizon discounted MDP with discounted factor lambda 0 5 Calculate the expected total discounted reward of a stationary policy delta infty with delta ( s 1 ) a 1 and delta ( s 2 ) a 3 Also, use the optimality equations to check if it is the optimal policy

The Answer is in the image, click to view ...

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Jul 26, 2024

Consider a two - state Markov decision process ( MDP ) with state s 1 and state s 2 . In state s 1 ,

Consider a two

-

state Markov decision process

(

MDP

)

with state s

1

and state s

2 .

In state s

1,

the decision maker chooses either action a

1

or action a

2

; In state s

2,

only action a

3

is available. The immediate returns and transition probabilities are as follows.

(

1,

1) = 4,

(

1,

2) = 10,

(

2,

3) = 2,

(

1 |

1,

1) =

(

2 |

1,

1) = 0.5,

(

2 |

1,

2) = 1,

(

1 |

2,

3) = 0.2,

(

2 |

2,

3) = 0.8 .

(

)

Solve the three

-

periods problem with terminal reward r

4 (

1) =

4 (

2) = 0

to maximize the expected total rewards and find the optimal decision rule in each period.

(

)

Consider the infinite

-

horizon discounted MDP with discounted factor

\

lambda

= 0.5 .

Calculate the expected total discounted reward of a stationary policy

\

delta

\

infty with

\

delta

(

1) =

1

and

\

delta

(

2) =

3 .

Also, use the optimality equations to check if it is the optimal policy.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Introduction To Data Mining

Authors: Pang Ning Tan, Michael Steinbach, Vipin Kumar

1st Edition

321321367, 978-0321321367

More Books

Students also viewed these Databases questions

Question

★★★★★

Several years ago, a well-known national real estate company built a computer-based system to help its real estate agents sell houses more quickly. The system, which worked in many ways like an early...

Answered: 1 week ago

Question

★★★★★

Capacitor 3 in Figure is a variable capacitor (its capacitance C3 can be varied). Figure gives the electric potential V1 across capacitor 1 versus C3. The horizontal scale is set by C3s = 12.0F....

Answered: 1 week ago

Question

★★★★★

Do you think that managers and employees would perceive the PSC of an organization similarly or differently? What implications would this have for stress in the organization?

Answered: 1 week ago

Question

★★★★★

Tech, with a student population of 30,000, is located in a small college town in Virginia. Direct Cast Cable TV has a small service staff that is sufficient to handle installations and TV hookups for...

Answered: 1 week ago

Question

★★★★★

Consider that you are 35 years old and have just changed to a new job. You have $72,000 in the retirement plan from your former employer. You can roll that money into the retirement plan of the new...

Answered: 1 week ago

Question

★★★★★

The stock of Business Adventures sells for $40 a share. Its likely dividend payout and end-of-year price depend on the state of the economy by the end of the year as follows: Required: a. Calculate...

Answered: 1 week ago

Question

★★★★★

Using the ACHE well-managed healthcare foundations for operations management, and the reading from A Comprehensive Guide to Value-Based Care for Primary Care...

Answered: 1 week ago

Question

★★★★★

Fanning moone. What would be the company's ROI in this scenario? EXERCISE 11-7 Contrasting Return on Investment (ROI) and Residual Income [L01, L02] Rains Nickless Ltd. of Australia has two divisions...

Answered: 1 week ago

Question

★★★★★

1.) Write about the importance/ advantages of computer networks? 2.) what are the disadvantages of computer networks explain briefly? 3.) Write briefly about the REFERENCE MODELS in computer...

Answered: 1 week ago

Question

★★★★★

Draw a picture consisting parts of monocot leaf

Answered: 1 week ago

Question

★★★★★

Evaluate the theory of Democratic Peace and its relevance in contemporary international relations.

Answered: 1 week ago

Question

★★★★★

=+4. Go to Timberlands main Web site (http://www.timberland.com/) and also the companys volunteer Web site (http://www.timberland.com/corp/index.jsp?

Answered: 1 week ago

Question

★★★★★

=+ Would you consider participating? Would it change your feelings about working for the firm?

Answered: 1 week ago

Question

★★★★★

=+ Whose interests do they serve? Which of their two main responsibilities do you think is the most important? Why?

Answered: 1 week ago

Previous Question Next Question