Answered step by step
Verified Expert Solution
Question
1 Approved Answer
-A decision maker observes a discrete-time system which moves between states {S1, S2, S3, S4} according to the following transition probability matrix: 0.3 0.2
-A decision maker observes a discrete-time system which moves between states {S1, S2, S3, S4} according to the following transition probability matrix: 0.3 0.2 0.1 0.4 P = 0.4 0.2 0.1 0.5 0.0 0.3 0.0 0.8 0.1 0.0 0.0 0.6 At each point in time, the decision maker may leave the system and receive a reward of R = 20 units, or alternatively remain in the system and receive a reward of r(s;) units if the system occupies state s;. If the decision maker decides to remain in the system its state at the next decision epoch is determined by matrix P. Assume a discount rate of 0.9 and that r(s;) = i. a) Formulate this model as an MDP. b) Use both policy iteration and linear programming to find a stationary policy which minimizes the expected total discounted reward. compare the results, and report the optimal policy and the optimal value function for both methods. c) Find the smallest value of R so that it is optimal to leave the system in state 2.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started