Consider the simple MDP shown below Starting from state s 1 , the agent can move to the right ( a 0 ) or left ( a 1 ) from any state si Actions are deterministic ( e g choosing a 1 at state s 2 results in transition to state s 1 ) Taking any action from the goal state G earns a reward of r 1 and the agent stays in state G Otherwise, each move has zero reward ( r 0 ) Assume a discount factor gamma 1 $ ' ) , 0 , 0 1 ) , 0 ) , 0 , 0 , 0 ( a ) What is the optimal action at any state si G Find the optimal value function for all states si and the goal state G 5 pts ( b ) Does the optimal policy depend on the value of the discount factor gamma Explain your answer 5 pts ( c ) Consider adding a constant c to all rewards Find the new optimal value function for all states si and the goal state G Does adding a constant reward c change the optimal policy Explain your answer 5 pts ( d ) After adding a constant c to all rewards now consider scaling all the rewards by a constant a ( i e rnew a ( c rold ) ) Find the new optimal value function for all states si and the goal state G Does that change the optimal policy Explain your answer, If yes, give an example of a and c that changes the optimal policy 5 pts

Question

Consider the simple MDP shown below  Starting from state s 1 , the agent can move to the right ( a 0 ) or left ( a 1 ) from any state si   Actions are deterministic ( e   g   choosing a 1 at state s 2 results in transition to state s 1 )   Taking any action from the goal state G earns a reward of r     1 and the agent stays in state G   Otherwise, each move has zero reward ( r   0 )   Assume a discount factor   gamma   1     $       ' ) ,   0   ,   0   1 ) ,   0 ) ,   0   ,   0   ,   0 ( a ) What is the optimal action at any state si   G   Find the optimal value function for all states si and the goal state G     5 pts   ( b ) Does the optimal policy depend on the value of the discount factor   gamma   Explain your answer    5 pts   ( c ) Consider adding a constant c to all rewards  Find the new optimal value function for all states si and the goal state G   Does adding a constant reward c change the optimal policy  Explain your answer    5 pts   ( d ) After adding a constant c to all rewards now consider scaling all the rewards by a constant a ( i   e   rnew   a ( c   rold ) )   Find the new optimal value function for all states si and the goal state G   Does that change the optimal policy  Explain your answer, If yes, give an example of a and c that changes the optimal policy    5 pts

Accepted Answer

The Answer is in the image, click to view ...

Question

Consider the simple MDP shown below. Starting from state s 1 , the agent can move to the right ( a 0 ) or left

Step by Step Solution

Step: 1

Get Instant Access to Expert-Tailored Solutions

Step: 2

Step: 3

Ace Your Homework with AI

Recommended Textbook for

Oracle Database Upgrade Migration And Transformation Tips And Techniques

Students also viewed these Databases questions

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question