Answered step by step
Verified Expert Solution
Question
1 Approved Answer
In the following questions, assume that the mathematical model of an MDP(S,A,ps,r,) is given in the following form. - State set S is represented by
In the following questions, assume that the mathematical model of an MDP(S,A,ps,r,) is given in the following form. - State set S is represented by integers S={1,,n}, - Action set A is represented by integers A={1,,m}, - State transition probabilities ps are given by a 3D array P in the form pijk=ps(sjsi,ak), - Rewards r are given by a 3D array R in the form rijk=r(si,sj,ak). Note that since the sets S and R are composed of consecutive integers starting from 1, it is enough to known m and n to define these sets. m and n can be easily obtained from the dimensions of the matrix P or R. 1) Consider a finite horizon MPD(S,A,ps,r,1) with the final time T. a) Write a MATLAB function that takes arrays P,R, and the horizon T corresponding to the finite horizon MDP and computes the optimal value function and policy of this MDP. The function should return two values. One is a 2D matrix V composed of the elements vit of the time-dependent optimal value function computed. The other is PI composed of the elements of the time-dependent optimal policy it computed. b) Find the optimal value function and policy of the Stochastic Roller problem studied in the class for T=10 using the function you wrote in part a). Note that you should represent the states L,M,H, and the actions spin and don't spin using integers
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started