Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 22, 2024

python help q 3 Policy Iteration vs Value Iteration In Part 3 , you will compare the convergence time of policy iteration and value iteration.

python help q

3

Policy Iteration vs Value Iteration

In Part

3,

you will compare the convergence time of policy iteration and value iteration. Both techniques are guaranteed to converge to the optimal policy in a finite number of steps, but we will see that the number of steps required can be considerably different.

3 .

-

Create Environment

Create a

30

30

instance of the FrozenPlatform environment with sp

_

range

= [0, 0.2],

a start position of

1 (

which is the default

),

no holes, and with random

_

state

= 1 .

You do not need to display the environment.

[]

3 .

-

Policy Iteration

Create an instance of the DPAgent class for the environment created in Step

3 .

.

Set gamma

= 1

and random

_

state

= 1 .

Run policy iteration with the default parameters.

[]

3 .

-

Value Iteration

Create annother instance of the DPAgent class for the environment created in Step

3 .

.

Set gamma

= 1

and random

_

state

= 1 .

Run value iteration with the default parameters.

[]

3 .

-

Algorithm Comparison

In the previous steps, you should have noticed that policy iteration had a considerably longer runtime than value iteration. You will now explore how these runtimes depend on environment size.

Starter code has been provided to you for this step. The code is intended to use a loop to create FrozenPlatform environments of size

2

2, 3

3, 4

4,

and so on up to

25

25 .

Both policy iteration and value iteration will be applied to each environment. The time

()

function from the time module will be used to calculate the runtime for each algorithm, storing the results in two different lists.

After the loop is complete, the cell should output the following two messages, with the blanks filled in with the appropriate values, rounded to

4

decimal places.

% %

time

3 .

rng

=

range

(______,______)

pol

_

iter

_

times

= []

val

_

iter

_

times

= []

#np

.

random.seed

(1)

for i in tqdm

(

rng

)

temp

_

=

FrozenPlatform

(

rows

=______,

cols

=______,

_

range

= [0.1, 0.4],

holes

= 0,

random

_

state

=

)

0 =

time.time

()

temp

_

=

DPAgent

(

env

=

temp

_

,

gamma

= 1,

random

_

state

=

)

temp

_

.

policy

_

iteration

(

report

=

False

)

delta

_

=

time.time

() -

0

______.

append

(

delta

_

)

0 =

time.time

()

temp

_

=

DPAgent

(

env

=

temp

_

,

gamma

= 1,

random

_

state

=

)

temp

_

.

value

_

iteration

(

report

=

False

)

delta

_

=

time.time

() -

0

______.

append

(

delta

_

)

(

'

Average time for policy iteration:

{

.

mean

(______)

. 4

}')

(

'

Average time for value iteration:

{

.

mean

(______)

. 4

}')

Visualizing Results

Use Matplotlib to create a figure with two line plots on a single axis. The y

-

values for each line plot should come from the runtime lists created in the previous cell. The x

-

values should be the associated environment sizes

(2

through

25) .

Create the figure according to the following specifications.

Set the figsize to

[6, 3] .

The title should read "Runtime Comparison".

The x and y axes should be labeled "Environment Size" and "Runtime

(

in seconds

) ",

respectively.

Add a legend with labels "Policy Iteration" and "Value Iteration" to explain which line corresponds to which algorithm.

Add a grid to your plot.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Building A Cloud Data Solutions An End To End Guide For Designing Implementing And Managing Robust Data Solutions In The Cloud

Authors: Anouar Ben Zahra

1st Edition

B0CKY5Z96M, 979-8863958149

More Books

Students also viewed these Databases questions

Question

★★★★★

What are the five steps for preparing the statement of cash flows? What is the purpose of each step?

Answered: 1 week ago

Question

★★★★★

reconcile the variances with the actual performance.

Answered: 1 week ago

Question

★★★★★

Think of and then list all of the items that you think should be included in a new employee orientation. Briefly justify why each item should be included.

Answered: 1 week ago

Question

★★★★★

Sea Duds is retail clothing store specializing in resort clothing. The company purchases finished goods apparel and accessories, and its other payments are typical operating costs such as rent,...

Answered: 1 week ago

Question

★★★★★

ANSWER CORRECTLY PLEASE 1.Use the Java programming language to implement a RestructureableNodeBinaryTree class that supports the methods of the binary tree ADT, and a method restructure for...

Answered: 1 week ago

Question

★★★★★

E18-23 (book/static) Shea Winery in Pleasant Valley, New York, has two departments: Fermenting and Packaging. Direct materials are added at the beginni Conversion costs are added evenly throughout...

Answered: 1 week ago

Question

★★★★★

PART III Average rate of change of exponential functions near zero. (5) 1. Given f(x) = 2*, find the average rate of change of f(x) from x = 0 to x = 0.1, so that Ax = 0.1. Recall that the average...

Answered: 1 week ago

Question

★★★★★

Given the following cash flows of a project, Cash flow Year 0 -$100,000 Year 1 $50,000 Year 3 -$20,000 Year 5 $80,000 a) What is the NPV? List formula and input numbers, no calculation needed b) What...

Answered: 1 week ago

Question

★★★★★

The pilot choke option: A. B. C. Is an orifice in the main pressure port of a two stage directional valve used to reduce flow to the system. Is a device that throttles the flow of pressurized fluid...

Answered: 1 week ago

Question

★★★★★

The well-insulated, rigid storage tank in the figure below contains air and carbon dioxide, separated by a moveable divider that allows for heat transfer to occur between the two gases. If the air...

Answered: 1 week ago

Question

★★★★★

53. The following are production and cost data for two products, buckets and pails, produced in batches of 600 each. Contribution margin per batch Machine set-ups needed per batch Buckets Pails $360...

Answered: 1 week ago

Question

★★★★★

Technology. Your instructor has asked you to speak to the faculty club at your school. Your speech is to be about the benefits of membership in student organizations. You are unable to make the...

Answered: 1 week ago

Question

★★★★★

Seafood Specialties Restaurant offers a shrimp and lobster tail special each Friday evening.You are manager of the restaurant. On Tuesday, you received a request from a customer, Shipley Masters, who...

Answered: 1 week ago

Question

★★★★★

You are the customer service representative for Fly High Airline. The airline awards frequent flyer miles but charges a $50 fee per ticket unless the free flight is booked more than 14 days in...

Answered: 1 week ago

Previous Question Next Question