Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

help with python: Average Path Length In Part 4 , you will apply value iteration to a relatively large Frozen Platform environment and will then

help with python:
Average Path Length
In Part 4, you will apply value iteration to a relatively large Frozen Platform environment and will then study the average path length for successful and unsuccessful episodes run under the optimal policy.
4.A - Create Environment
Create a 12x24 instance of the FrozenPlatform environment with sp_range=[0.1,0.4], a start position of 27, with 25 holes, and with random_state=1. Display the environment with cells, set fill to shade the cells according to their slip probabilities, set size=3, and set show_nums=False.
[]
4.B - Value Iteration
Create an instance of the DPAgent class for the environment created in Step 4.A, with gamma=1 and random_state=1. Run value iteration with the default parameters.
Display the environment again, this time set fill to shade the cells accoprding to the state-value function for the optimal policy, set contents to display the optimal policy, set size=3, and set show_nums=False.
[]
4.C - Average Performance
You will now study the average performance of an agent following the optimal policy found in 4.B. You will estimate the agent's success rate, and will also determine the average path length for successful episodes as well as for unsuccessful episodes.
Starter code has been provided below. Fill in the blanks as required to accomplish the tasks described below.
The code should generate 10,000 episodes following the optimal policy. After each episode, determine if the agent reached the goal. If so, increment the goal count and append the length of the resulting path to the list s_lengths. If the agent did not reach the goal, then append the path length to the list f_lengths.
Then print messages regarding the success rate under the optimal policy, as well as the average path length for both successful and failed episodes.
N =10000
s_lengths =[]
f_lengths =[]
goals =0
np.random.seed(1)
for i in range(N):
ep =______.generate_episode(policy=______.policy)
path_length =______
if ep.state == ep.______:
goals +=1
s_lengths.append(______)
else:
f_lengths.append(______)
sr =______
print('When working under the optimal policy:')
print(f"The agent's success rate was {______:.4f}.")
print(f'The average path length for successful episodes was {np.mean(______):.1f}.')
print(f'The average path length for unsuccessful episodes was {np.mean(______):.1f}.')
4.D - Visualizing Results
Use Matplotlib to create a 1x2 grid of subplots. Each subplot should contain a histogram indicating the distribution of path lengths. One histogram should correspond to path lengths for successful episodes and the other to unsuccessful episodes. The figure should have the following characteristics:
Set the figure size to be [8,3].
The subplots should be titled "Successful Episodes" and "Unsuccessful Episodes".
The x-axis of each subplot should be labeled "Path Length" and the y-axis should be labeled "Episode Count".
Use 20 bins for each histogram.
Select a different color for each subplot. Set the edgecolor of the bars to 'black' or 'k'.
Set the xlim to be the same for both subplots. Select values that result in no bars getting cut off.
Set the ylim to be the same for both subplots. Select values that result in no bars getting cut off.
[]
4.E - Successful Episode
Use the generate_episode() method of the environment to simulate an episode following the optimal policy in found by value iteration in 4.B. Set show_result=True and set a value of your choice for random_state.
Call the display() method of the enviornment, setting the fill, contents, and show_path parameters sp that cells are shaded to indicate the optimal state-value function, arrows for the the optimal policy are displayed, and the path taken during the episode is shown.
Experiment with the value of random_state to find one that results in the agent finding the goal. Use that value for your final submission.
[]
4.F - Failed Episode
Repeat Step 4.E, but this time find a value for random_state that results in a failed episode.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

More Books

Students also viewed these Databases questions

Question

T F A retailer buys and sells merchandise.

Answered: 1 week ago

Question

=+25-4 Identify the stimulants, and describe their effects.

Answered: 1 week ago