Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

really struggling with value iteration and discount factor on these problems. please help me solve these with steps so that i can learn how to

really struggling with value iteration and discount factor on these problems.
please help me solve these with steps so that i can learn how to work them! thank you! image text in transcribed
image text in transcribed
Consider Pacman that uses MDPs to maximize his expected utility. In each environment: - Pacman has the standard actions (North, East, South, West) unless blocked by an outer wall - There is a reward of 1 point when eating the dot (for example, in the grid below, R(A,South,B)=1 ) - The game ends when the dot (blue cirele) is caten. (a) Consider the following grid where there is a single food pellet in the botiom right corner (B). The discount factor is 0.2. There is no living reward. The states are simply the grid location. cation. a) What is the optimal policy for each state? b) What is the optimal value for the state of being in the upper left comer (E)? Reminder: the discount factor is 0.2. c) Using value itention with the value of all states equal to zero at k=0, for which iteration k will Vk(F)=V(F), explain. Consider the following grid world MDP for the rest of this question. Shaded cells represent walls. In all states, the agent has available actions 1,,,. Performing an action that would transition to an invalid state (outside the grid or into a wall) results in the agent remaining in its original state. In states with an arrow coming out, the agent has an additional action EXIT. In the event that the EXIT aetion is taken, the agent receives the labeled reward and ends the game in the terminal state T. Unless otherwise stated, all other transitions receive no reward, and all transitions are deterministic. For all parts of the problem, assume that value iteration begins with all states initialized to zero, i.e,, V0(s)=0,s. Let the discount fictor be =0.25 for all following parts. a) Suppose that we are performing value iteration on the grid world MDP below. What is the optimal value of V(A) and V(B) ? Explain your answer (show how you compute them) b) After how many iterations k will we have Vk(s)=V(s) for all states s ? If it never occurs, write "never". Show your computation

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions

Question

2 6 8 .

Answered: 1 week ago

Question

10. What is meant by a feed rate?

Answered: 1 week ago