Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Problem 5 [30 points]. Carry out policy iteration over the MDP example covered in class with R given in Table 2 and -09. For a

image text in transcribed

Problem 5 [30 points]. Carry out policy iteration over the MDP example covered in class with R given in Table 2 and -09. For a state s, if R(s)-+1, s is a terminal state. For the transition model, assume that the agent has 0.9 probability of going to the intended direction and 0.1 probability of moving to the left. For example, if the agent is at the lower left corner (coordinates (1,1)) and intends to go right, then it will reach (2,1) with 0.9 probability and (1,2) with 0.1 probability. If a target cell is not reachable, then the corresponding probability goes back to the current cel. For example, if the agent is at (3,3) and is trying to go up, then with 0.1 probability it goes to (2,3) and with 0.9 probability it is stuck at (3,3). For your answer you should provide a) [15 points]. The first two iterations of your computation b) [15 points]. The converged rewards and the extracted policy. For this problem, you need to provide last two iterations showing that the value changes are within 0.001 for all cells Table 2: Reward R for a 4x 3 grid world 0.05-0.05 -0.051 0.05 OBS -0.05-1 0.05-0.05-0.05 -0.05 As a suggestion, you should complete the first question manually to make sure you will be able to do so, for obvious reasons :). For solving the second, it is perhaps better to do it using a program, perhaps using Python or e xcel Problem 5 [30 points]. Carry out policy iteration over the MDP example covered in class with R given in Table 2 and -09. For a state s, if R(s)-+1, s is a terminal state. For the transition model, assume that the agent has 0.9 probability of going to the intended direction and 0.1 probability of moving to the left. For example, if the agent is at the lower left corner (coordinates (1,1)) and intends to go right, then it will reach (2,1) with 0.9 probability and (1,2) with 0.1 probability. If a target cell is not reachable, then the corresponding probability goes back to the current cel. For example, if the agent is at (3,3) and is trying to go up, then with 0.1 probability it goes to (2,3) and with 0.9 probability it is stuck at (3,3). For your answer you should provide a) [15 points]. The first two iterations of your computation b) [15 points]. The converged rewards and the extracted policy. For this problem, you need to provide last two iterations showing that the value changes are within 0.001 for all cells Table 2: Reward R for a 4x 3 grid world 0.05-0.05 -0.051 0.05 OBS -0.05-1 0.05-0.05-0.05 -0.05 As a suggestion, you should complete the first question manually to make sure you will be able to do so, for obvious reasons :). For solving the second, it is perhaps better to do it using a program, perhaps using Python or e xcel

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Spatial Database Systems Design Implementation And Project Management

Authors: Albert K.W. Yeung, G. Brent Hall

1st Edition

1402053932, 978-1402053931

More Books

Students also viewed these Databases questions

Question

How many Tables Will Base HCMSs typically have? Why?

Answered: 1 week ago

Question

What is the process of normalization?

Answered: 1 week ago