Consider the gridworld ( Fig 1 ) Here, the goal is to reach the goal state ( bottom right hand corner grid ) in few steps The reward is 1 at the goal state and 0 Figure 1 'The gridworld everywhere else The discount factor is 0 9 5 You can go up , down, right, or left You will transition deterministically to the adjacent grid in the direction of the action If you are in the goal state you will be there forever no matter what action is You will also be in the same state if you hit the wall due to an action ( e g , if you are in top right hand corner and you take action right, you will hit the wall, and will stay there in the next state ) Clearly mention the states, the actions, the reward, and the transition probability What is the dimension of the state space From the initial state as depicted in the Figure 1 you should be able to provide an optimal policy What is the optimal value function Now implement Value Iteration algorithm to see whether it matches the result Suppose you increase the reward at every grid by 1 , will the optimal policy change

Question

Consider the gridworld ( Fig   1 )   Here, the goal is to reach the goal state ( bottom right hand corner grid ) in few steps  The reward is 1 at the goal state and 0 Figure 1   'The gridworld everywhere else  The discount factor is   0   9 5   You can go up , down, right, or left  You will transition deterministically to the adjacent grid in the direction of the action  If you are in the goal state you will be there forever no matter what action is   You will also be in the same state if you hit the wall due to an action ( e   g   , if you are in top   right hand corner and you take action right, you will hit the wall, and will stay there in the next state )   Clearly mention the states, the actions, the reward, and the transition probability  What is the dimension of the state   space  From the initial state as depicted in the Figure 1 you should be able to provide an optimal policy  What is the optimal value function  Now implement Value Iteration algorithm to see whether it matches the result  Suppose you increase the reward at every grid by   1 , will the optimal policy change

Accepted Answer

The Answer is in the image, click to view ...

Question

Consider the gridworld ( Fig . 1 ) . Here, the goal is to reach the goal state ( bottom right hand corner grid )

Step by Step Solution

Step: 1

Get Instant Access to Expert-Tailored Solutions

Step: 2

Step: 3

Ace Your Homework with AI

Recommended Textbook for

PC Magazine Guide To Client Server Databases

Students also viewed these Databases questions

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question

Question