Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 26, 2024

Problem 5 [30 points]. Carry out policy iteration over the MDP example covered in class with R given in Table 2 and = 0.9. For

image text in transcribed

Problem 5 [30 points]. Carry out policy iteration over the MDP example covered in class with R given in Table 2 and = 0.9. For a state s, if R(s) = 1, s is a terminal state. F transition model, assume that the agent has 0.9 probability of going to the intended direction and 0.1 probability of moving to the left. For example, if the agent is at the lower left corner (coordinates (1, 1)) and intends to go right, then it will reach (2, 1) with 0.9 probability and (1,2) with 0.1 probability. If a target cell is not reachable, then the corresponding probability goes back to the current cell. For example, if the agent is at (3,3) and is trying to go up, then with 0.1 probability it goes to (2.3) and with 0.9 probability it is stuck at (3,3). For your answer you should provide: or the a) [15 points]. The first two iterations of your computation. b) [15 points). The converged rewards and the extracted policy. For this problem, you need to provide last two iterations showing that the value changes are within 0.001 for all cells. Table 2: Reward R for a 4 x 3 grid world 0.05 OBS 0.051 0.05 0.05 0.05 0.05 Problem 5 [30 points]. Carry out policy iteration over the MDP example covered in class with R given in Table 2 and = 0.9. For a state s, if R(s) = 1, s is a terminal state. F transition model, assume that the agent has 0.9 probability of going to the intended direction and 0.1 probability of moving to the left. For example, if the agent is at the lower left corner (coordinates (1, 1)) and intends to go right, then it will reach (2, 1) with 0.9 probability and (1,2) with 0.1 probability. If a target cell is not reachable, then the corresponding probability goes back to the current cell. For example, if the agent is at (3,3) and is trying to go up, then with 0.1 probability it goes to (2.3) and with 0.9 probability it is stuck at (3,3). For your answer you should provide: or the a) [15 points]. The first two iterations of your computation. b) [15 points). The converged rewards and the extracted policy. For this problem, you need to provide last two iterations showing that the value changes are within 0.001 for all cells. Table 2: Reward R for a 4 x 3 grid world 0.05 OBS 0.051 0.05 0.05 0.05 0.05

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advanced Data Management For Sql Nosql Cloud And Distributed Databases

Advanced Data Management For Sql Nosql Cloud And Distributed Databases

Authors: Lena Wiese

1st Edition

9783110441406

More Books

Students also viewed these Databases questions

Question

★★★★★

The probability that thunderstorms are in the vicinity of a particular Midwestern airport on an August day is 0.70. When thunderstorms are in the vicinity, the probability that an airplane lands on...

Answered: 1 week ago

Question

★★★★★

For the truss shown in Figure P3-22 solve for the horizontal and vertical components of displacement at node 1 and determine the stress in each element. Also verify force equilibrium at node 1. All...

Answered: 1 week ago

Question

★★★★★

=+What other programs could be implemented to prepare the next generation of leaders?

Answered: 1 week ago

Question

★★★★★

Stalberg Company's beginning inventory and purchases during the fiscal year ended December 31, 20--, were as follows: There are 10 units of inventory on hand on December 31. 1. Calculate the total...

Answered: 1 week ago

Question

★★★★★

Problem 5 [30 points]. Carry out policy iteration over the MDP example covered in class with R given in Table 2 and = 0.9. For a state s, if R(s) = 1, s is a terminal state. F transition model,...

Answered: 1 week ago

Question

★★★★★

It had been a quiet Monday morning for Anna Hogue, senior project manager at Flagstone Consulting. Everything seemed to be falling into place for the companys first conference, Healthcare Management...

Answered: 1 week ago

Question

★★★★★

The hydraulic cylinder CF, which partially controls the position of rod DE, has been locked in the position shown. Member BD is 15 mm thick and is connected at C to the vertical rod by a...

Answered: 1 week ago

Question

★★★★★

Help! At May 31, 2022, the accounts of Oriole Company show the following. 1. May 1 inventories-finished goods $13,830, work in process $16,110, and raw materials $8,960. 2. May 31...

Answered: 1 week ago

Question

★★★★★

Wall Inc. forecasts that it will have the free cash flows (in millions) shown below. If the weighted average cost of capital is 14% and the free cash flows are expected to continue growing at the...

Answered: 1 week ago

Question

★★★★★

At January 1, 2024, Caf Med leased restaurant equipment from Crescent Corporation under a nine-year lease agreement. The lease agreement specifies annual payments of $27,000 beginning January 1,...

Answered: 1 week ago

Question

★★★★★

Besides being intelligible, another reason to fluctuate your volume during your speech is _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ . Group of answer choices so audience members in the back can hear...

Answered: 1 week ago

Question

★★★★★

How do modern Dashboards differ from earlier implementations?

Answered: 1 week ago

Question

★★★★★

Provide an example of a descending Hierarchy of Data Validation/Lookup Tables.

Answered: 1 week ago

Question

★★★★★

In a HCM Database, how does applying Relational Design and Third Normal Form rules avoid duplication of Job Title storage in each employee base record?

Answered: 1 week ago

Previous Question Next Question