Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Question 1-A decision maker observes a discrete-time system which moves between states 1s1,s2, S3,s4 according to the following transition probability matrix: 0.3 0.4 0.2 0.1

image text in transcribed

Question 1-A decision maker observes a discrete-time system which moves between states 1s1,s2, S3,s4 according to the following transition probability matrix: 0.3 0.4 0.2 0.1 0.2 0.3 0.50.0 0.1 0.0 0.8 0.1 0.4 0.0 0.0 0.6 At each point in time, the decision maker may leave the system and receive a reward of R = 20 units, or alternatively remain in the system and receive a reward of r (si) units i the system occupies state si. If the decision maker decides to remain in the system its state at the next decision epoch is determined by matrix P. Assume a discount rate of 0.9 and that r(si)i a) Formulate this model as an MDP. b) Use both policy iteration and linear programming to find a stationary policy which minimizes the expected total discounted reward. compare the results, and report the optimal policy and the optimal value function for both methods. Find the smallest value of R so that it is optimal to leave the system in state 2 c)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Basics Computer EngineeringInformation Warehouse Basics From Science

Authors: Odiljon Jakbarov ,Anvarkhan Majidov

1st Edition

620675183X, 978-6206751830

Students also viewed these Databases questions