help with python Average Path Length In Part 4 , you will apply value iteration to a relatively large Frozen Platform environment and will then study the average path length for successful and unsuccessful episodes run under the optimal policy 4 A Create Environment Create a 1 2 x 2 4 instance of the FrozenPlatform environment with sp range 0 1 , 0 4 , a start position of 2 7 , with 2 5 holes, and with random state 1 Display the environment with cells, set fill to shade the cells according to their slip probabilities, set size 3 , and set show nums False 4 B Value Iteration Create an instance of the DPAgent class for the environment created in Step 4 A , with gamma 1 and random state 1 Run value iteration with the default parameters Display the environment again, this time set fill to shade the cells accoprding to the state value function for the optimal policy, set contents to display the optimal policy, set size 3 , and set show nums False 4 C Average Performance You will now study the average performance of an agent following the optimal policy found in 4 B You will estimate the agent's success rate, and will also determine the average path length for successful episodes as well as for unsuccessful episodes Starter code has been provided below Fill in the blanks as required to accomplish the tasks described below The code should generate 1 0 , 0 0 0 episodes following the optimal policy After each episode, determine if the agent reached the goal If so , increment the goal count and append the length of the resulting path to the list s lengths If the agent did not reach the goal, then append the path length to the list f lengths Then print messages regarding the success rate under the optimal policy, as well as the average path length for both successful and failed episodes N 1 0 0 0 0 s lengths f lengths goals 0 np random seed ( 1 ) for i in range ( N ) ep generate episode ( policy policy ) path length if ep state ep goals 1 s lengths append ( ) else f lengths append ( ) sr print ( ' When working under the optimal policy ' ) print ( f The agent's success rate was 4 f ) print ( f ' The average path length for successful episodes was np mean ( ) 1 f ' ) print ( f ' The average path length for unsuccessful episodes was np mean ( ) 1 f ' ) 4 D Visualizing Results Use Matplotlib to create a 1 x 2 grid of subplots Each subplot should contain a histogram indicating the distribution of path lengths One histogram should correspond to path lengths for successful episodes and the other to unsuccessful episodes The figure should have the following characteristics Set the figure size to be 8 , 3 The subplots should be titled Successful Episodes and Unsuccessful Episodes The x axis of each subplot should be labeled Path Length and the y axis should be labeled Episode Count Use 2 0 bins for each histogram Select a different color for each subplot Set the edgecolor of the bars to 'black' or ' k ' Set the xlim to be the same for both subplots Select values that result in no bars getting cut off Set the ylim to be the same for both subplots Select values that result in no bars getting cut off 4 E Successful Episode Use the generate episode ( ) method of the environment to simulate an episode following the optimal policy in found by value iteration in 4 B Set show result True and set a value of your choice for random state Call the display ( ) method of the enviornment, setting the fill, contents, and show path parameters sp that cells are shaded to indicate the optimal state value function, arrows for the the optimal policy are displayed, and the path taken during the episode is shown Experiment with the value of random state to find one that results in the agent finding the goal Use that value for your final submission 4 F Failed Episode Repeat Step 4 E , but this time find a value for random state that results in a failed episode

The Answer is in the image, click to view ...

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 22, 2024

help with python: Average Path Length In Part 4 , you will apply value iteration to a relatively large Frozen Platform environment and will then

help with python:

Average Path Length

In Part

4,

you will apply value iteration to a relatively large Frozen Platform environment and will then study the average path length for successful and unsuccessful episodes run under the optimal policy.

4 .

-

Create Environment

Create a

12

24

instance of the FrozenPlatform environment with sp

_

range

= [0.1, 0.4],

a start position of

27,

with

25

holes, and with random

_

state

= 1 .

Display the environment with cells, set fill to shade the cells according to their slip probabilities, set size

= 3,

and set show

_

nums

=

False.

[]

4 .

-

Value Iteration

Create an instance of the DPAgent class for the environment created in Step

4 .

,

with gamma

= 1

and random

_

state

= 1 .

Run value iteration with the default parameters.

Display the environment again, this time set fill to shade the cells accoprding to the state

-

value function for the optimal policy, set contents to display the optimal policy, set size

= 3,

and set show

_

nums

=

False.

[]

4 .

-

Average Performance

You will now study the average performance of an agent following the optimal policy found in

4 .

.

You will estimate the agent's success rate, and will also determine the average path length for successful episodes as well as for unsuccessful episodes.

Starter code has been provided below. Fill in the blanks as required to accomplish the tasks described below.

The code should generate

10, 000

episodes following the optimal policy. After each episode, determine if the agent reached the goal. If so

,

increment the goal count and append the length of the resulting path to the list s

_

lengths. If the agent did not reach the goal, then append the path length to the list f

_

lengths.

Then print messages regarding the success rate under the optimal policy, as well as the average path length for both successful and failed episodes.

= 10000

_

lengths

= []

_

lengths

= []

goals

= 0

.

random.seed

(1)

for i in range

(

)

=______.

generate

_

episode

(

policy

=______.

policy

)

path

_

length

=______

if ep

.

state

= =

.______

goals

+ = 1

_

lengths.append

(______)

else:

_

lengths.append

(______)

=______

('

When working under the optimal policy:

')

(

"

The agent's success rate was

{______

. 4

} . ")

(

'

The average path length for successful episodes was

{

.

mean

(______)

. 1

} .')

(

'

The average path length for unsuccessful episodes was

{

.

mean

(______)

. 1

} .')

4 .

-

Visualizing Results

Use Matplotlib to create a

1

2

grid of subplots. Each subplot should contain a histogram indicating the distribution of path lengths. One histogram should correspond to path lengths for successful episodes and the other to unsuccessful episodes. The figure should have the following characteristics:

Set the figure size to be

[8, 3] .

The subplots should be titled "Successful Episodes" and "Unsuccessful Episodes".

The x

-

axis of each subplot should be labeled "Path Length" and the y

-

axis should be labeled "Episode Count".

Use

20

bins for each histogram.

Select a different color for each subplot. Set the edgecolor of the bars to 'black' or

'

' .

Set the xlim to be the same for both subplots. Select values that result in no bars getting cut off.

Set the ylim to be the same for both subplots. Select values that result in no bars getting cut off.

[]

4 .

-

Successful Episode

Use the generate

_

episode

()

method of the environment to simulate an episode following the optimal policy in found by value iteration in

4 .

.

Set show

_

result

=

True and set a value of your choice for random

_

state.

Call the display

()

method of the enviornment, setting the fill, contents, and show

_

path parameters sp that cells are shaded to indicate the optimal state

-

value function, arrows for the the optimal policy are displayed, and the path taken during the episode is shown.

Experiment with the value of random

_

state to find one that results in the agent finding the goal. Use that value for your final submission.

[]

4 .

-

Failed Episode

Repeat Step

4 .

,

but this time find a value for random

_

state that results in a failed episode.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Entity Alignment Concepts Recent Advances And Novel Approaches

Authors: Xiang Zhao ,Weixin Zeng ,Jiuyang Tang

1st Edition

★★★★★

describe European Union initiatives relating to employee rights to information and consultation

Answered: 1 week ago

Previous Question Next Question