Answered step by step
Verified Expert Solution
Question
1 Approved Answer
. Consider the following gridworld. The available actios at any given state are North, East, West and South There are 2 states with +5 and
. Consider the following gridworld. The available actios at any given state are North, East, West and South There are 2 states with +5 and -5 rewards as shown in the figure. They are also terminal states where the agent can take an exit action The grey cell is a blocked state where your agent can't move. In a state where taking an action bumps the agent to a nearby wrall doesn't change the state of the agent, e., the agent ends up in the same cell. The discount facto in this gridworld is 0.9 and the transition probability of taking an action at a given state is 08. The agent can end up in a different state than expected with equal probability. You can take the exit action at a terminal state with probability 1. (16 Points) +5 -5 (a) Pertorm1 iteration of Value iteration algorithm. Draw the policy in the gridworld marked with arrowft iteration. Show your caleulations for each state. +5 -5 (b) Perform 2 iteration of Value iteration algorithm. Draw the policy in the gridworld marked teration Show your +5 -5 . Consider the following gridworld. The available actios at any given state are North, East, West and South There are 2 states with +5 and -5 rewards as shown in the figure. They are also terminal states where the agent can take an exit action The grey cell is a blocked state where your agent can't move. In a state where taking an action bumps the agent to a nearby wrall doesn't change the state of the agent, e., the agent ends up in the same cell. The discount facto in this gridworld is 0.9 and the transition probability of taking an action at a given state is 08. The agent can end up in a different state than expected with equal probability. You can take the exit action at a terminal state with probability 1. (16 Points) +5 -5 (a) Pertorm1 iteration of Value iteration algorithm. Draw the policy in the gridworld marked with arrowft iteration. Show your caleulations for each state. +5 -5 (b) Perform 2 iteration of Value iteration algorithm. Draw the policy in the gridworld marked teration Show your +5 -5
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started