Question 6 ( 6 marks ) Consider the following MDP the set of states is S s 0 , s 1 , s 2 , s 3 and the set of actions available at each state is A l , r Each episode of the MDP starts in s 1 and terminates in s 0 You do not know the transition probabilities or the reward function of the MDP , so you are using Sarsa to find the optimal policy Suppose the current Q values are Q ( s 0 , l ) 0 , Q ( s 0 , r ) 0 Q ( s 1 , l ) 3 4 , Q ( s 1 , r ) 1 8 Q ( s 2 , l ) 0 8 , Q ( s 2 , r ) 0 7 Q ( s 3 , l ) 0 5 , Q ( s 3 , r ) 7 5 Suppose the next episode is as follows s 1 , l , 1 , s 1 , r , 1 , s 2 , l , 1 , s 1 , l , 1 0 , s 0 ( a ) ( 4 marks ) Do all the Sarsa updates to the Q values that would result from this episode, using 0 2 5 and 0 9 Show your working ( b ) ( 1 mark ) Based on the updated Q values, give the final policy determined by Q , i e , give ( s 1 ) , ( s 2 ) and ( s 3 ) Show your working ( c ) ( 1 mark ) Give an l o n greedy policy based on the Q values obtained in ( a )

Question

Question 6   ( 6 marks ) Consider the following MDP  the set of states is S     s 0 , s 1 , s 2 , s 3   and the set of actions available at each state is A     l , r     Each episode of the MDP starts in s 1 and terminates in s 0   You do not know the transition probabilities or the reward function of the MDP , so you are using Sarsa to find the optimal policy  Suppose the current Q   values are  Q ( s 0 , l )   0 , Q ( s 0 , r )   0 Q ( s 1 , l )   3   4 , Q ( s 1 , r )     1   8 Q ( s 2 , l )     0   8 , Q ( s 2 , r )     0   7 Q ( s 3 , l )     0   5 , Q ( s 3 , r )   7   5 Suppose the next episode is as follows  s 1 , l ,   1 , s 1 , r ,   1 , s 2 , l ,   1 , s 1 , l , 1 0 , s 0   ( a ) ( 4 marks ) Do all the Sarsa updates to the Q   values that would result from this episode, using   0   2 5 and   0   9   Show your working  ( b ) ( 1 mark ) Based on the updated Q   values, give the final policy determined by Q , i   e   , give ( s 1 ) , ( s 2 ) and ( s 3 )   Show your working  ( c ) ( 1 mark ) Give an l o n   greedy policy based on the Q   values obtained in ( a )

Accepted Answer

The Answer is in the image, click to view ...

Question

Question 6 . ( 6 marks ) Consider the following MDP: the set of states is S = { s 0 , s 1 ,

Step by Step Solution

Step: 1

Get Instant Access to Expert-Tailored Solutions

Step: 2

Step: 3

Ace Your Homework with AI

Recommended Textbook for

Systems Analysis And Synthesis Bridging Computer Science And Information Technology

Students also viewed these Databases questions

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question