Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Exercise 3 Consider a discounted dynamic programming problem with the state space S = {0,1}, and the set of admissible actions at any state x

image text in transcribed

Exercise 3 Consider a discounted dynamic programming problem with the state space S = {0,1}, and the set of admissible actions at any state x ES is A(x) = {1, 2}. The cost function C(x, a) is given by C(0,1) = 1, C(1, 1) = 2, C(0,2) = 0, C(1, 2) = 2. 1 The transition probabilities p(y|x, a) are fully determined by 1 2 p(0|0,1) 2 P(0|0,2) 1 4' p(0|1,1) p(0|1,2) 1 3 3 Let 1 BE 2 (a) Starting with W(2) W(0)(x) = 0 for all x E S, use the value iteration algorithm to approximate the value function W3 by W(3) := TW. Then what is the stationary policy obtained as the minimiser of TzW(3) ? Determine with justifications whether it is an optimal policy. [40 marks] (6) Now let f be the stationary policy that chooses action 1 in both states 0 and 1. Apply the policy iteration algorithm with the initial policy g) = f to generate policies until you reach an optimal stationary policy. (10 marks] (Hint: adjust the dynamic programming operator Tg as well as the value iteration and policy iteration algorithms accordingly, as we are dealing with a minimization problem here.) Exercise 3 Consider a discounted dynamic programming problem with the state space S = {0,1}, and the set of admissible actions at any state x ES is A(x) = {1, 2}. The cost function C(x, a) is given by C(0,1) = 1, C(1, 1) = 2, C(0,2) = 0, C(1, 2) = 2. 1 The transition probabilities p(y|x, a) are fully determined by 1 2 p(0|0,1) 2 P(0|0,2) 1 4' p(0|1,1) p(0|1,2) 1 3 3 Let 1 BE 2 (a) Starting with W(2) W(0)(x) = 0 for all x E S, use the value iteration algorithm to approximate the value function W3 by W(3) := TW. Then what is the stationary policy obtained as the minimiser of TzW(3) ? Determine with justifications whether it is an optimal policy. [40 marks] (6) Now let f be the stationary policy that chooses action 1 in both states 0 and 1. Apply the policy iteration algorithm with the initial policy g) = f to generate policies until you reach an optimal stationary policy. (10 marks] (Hint: adjust the dynamic programming operator Tg as well as the value iteration and policy iteration algorithms accordingly, as we are dealing with a minimization problem here.)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Public Finance

Authors: Harvey S. Rosen

5th Edition

025617329X, 978-0256173291

More Books

Students also viewed these Finance questions