Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 30, 2024

Exercise 3 Consider a discounted dynamic programming problem with the state space S = {0,1}, and the set of admissible actions at any state x

image text in transcribed

Exercise 3 Consider a discounted dynamic programming problem with the state space S = {0,1}, and the set of admissible actions at any state x ES is A(x) = {1, 2}. The cost function C(x, a) is given by C(0,1) = 1, C(1, 1) = 2, C(0,2) = 0, C(1, 2) = 2. 1 The transition probabilities p(y|x, a) are fully determined by 1 2 p(0|0,1) 2 P(0|0,2) 1 4' p(0|1,1) p(0|1,2) 1 3 3 Let 1 BE 2 (a) Starting with W(2) W(0)(x) = 0 for all x E S, use the value iteration algorithm to approximate the value function W3 by W(3) := TW. Then what is the stationary policy obtained as the minimiser of TzW(3) ? Determine with justifications whether it is an optimal policy. [40 marks] (6) Now let f be the stationary policy that chooses action 1 in both states 0 and 1. Apply the policy iteration algorithm with the initial policy g) = f to generate policies until you reach an optimal stationary policy. (10 marks] (Hint: adjust the dynamic programming operator Tg as well as the value iteration and policy iteration algorithms accordingly, as we are dealing with a minimization problem here.) Exercise 3 Consider a discounted dynamic programming problem with the state space S = {0,1}, and the set of admissible actions at any state x ES is A(x) = {1, 2}. The cost function C(x, a) is given by C(0,1) = 1, C(1, 1) = 2, C(0,2) = 0, C(1, 2) = 2. 1 The transition probabilities p(y|x, a) are fully determined by 1 2 p(0|0,1) 2 P(0|0,2) 1 4' p(0|1,1) p(0|1,2) 1 3 3 Let 1 BE 2 (a) Starting with W(2) W(0)(x) = 0 for all x E S, use the value iteration algorithm to approximate the value function W3 by W(3) := TW. Then what is the stationary policy obtained as the minimiser of TzW(3) ? Determine with justifications whether it is an optimal policy. [40 marks] (6) Now let f be the stationary policy that chooses action 1 in both states 0 and 1. Apply the policy iteration algorithm with the initial policy g) = f to generate policies until you reach an optimal stationary policy. (10 marks] (Hint: adjust the dynamic programming operator Tg as well as the value iteration and policy iteration algorithms accordingly, as we are dealing with a minimization problem here.)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Public Finance

Public Finance

Authors: Harvey S. Rosen

5th Edition

025617329X, 978-0256173291

More Books

Students also viewed these Finance questions

Question

★★★★★

1. To obtain a patent, an inventor must show that her invention meets all of the following tests, except: A. It has not ever been used anyplace in the world. B. It is a new idea. C. It has never been...

Answered: 1 week ago

Question

★★★★★

Course : ECEN 3734 - Computer Design Topic: VHDL Code Question: For the given VHDL code: (a) Describe the function of the VHDL codes and develop force statement to verify the output of Z. (b) Draw...

Answered: 1 week ago

Question

★★★★★

=+ c. If money demand does not depend on the interest rate, the IS curve is horizontal.

Answered: 1 week ago

Question

★★★★★

The COSO internal control framework identifies five internal control components. Which of those components is most relevant to the procedures that SOX mandates public companies establish to enable,...

Answered: 1 week ago

Question

★★★★★

Exercise 3 Consider a discounted dynamic programming problem with the state space S = {0,1}, and the set of admissible actions at any state x ES is A(x) = {1, 2}. The cost function C(x, a) is given...

Answered: 1 week ago

Question

★★★★★

How can accounting bonuses be an effective means to align managers' incentives woth shareholders? Why is not stock based compensation sufficient? Note: Please do not copy and paste from internet.

Answered: 1 week ago

Question

★★★★★

A tourist named Mr. Lopez from Chile visited Arlington and bought a ticket for the Dallas Cowboys game. During his stay, he also paid for his hotel, meals, and various items from local shops, which...

Answered: 1 week ago

Question

★★★★★

Problem 3-15 (Static) Journal Entries; T-Accounts; Financial Statements [LO3-1, LO3-2, LO3-3, LO3-4] Froya Fabrikker A/S of Bergen, Norway, is a small company that manufactures specialty heavy...

Answered: 1 week ago

Question

★★★★★

Marin Inc. sells tickets for a Caribbean cruise on ShipAway Cruise Lines to Cullumber Company employees. The total cruise package price to Cullumber Company employees is $84,000. Marin Inc. receives...

Answered: 1 week ago

Question

★★★★★

Required information [The following information applies to the questions displayed below] Lakewood Tennis Club (LTC) operates an indoor tennis facility. The company charges a $170 annual membership...

Answered: 1 week ago

Question

★★★★★

1) are logistics a quantitative factor or a qualitative factor in project management? explain using case example(s).

Answered: 1 week ago

Previous Question Next Question