Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

help with Q 1 import numpy as np import matplotlib.pyplot as plt import time from tqdm import tqdm from aitools.algs import DPAgent, MCAgent from aitools.envs

help with Q 1
import numpy as np
import matplotlib.pyplot as plt
import time
from tqdm import tqdm
from aitools.algs import DPAgent, MCAgent
from aitools.envs import FrozenPlatform
Create Environment
An instance of the FrozenPlatform environment has been provided for you in this cell. Call the display() method of this isntance with fill='slip' and contents='slip' to display the environment with the slip probabilities for each state.
run cells below
pi1={0:0,1:2,2:2,3:2,4:3,5:1,6:1,7:2,8:0,9:0,10:1,11:2,12:2,13:0,14:1,15:1,16:0}
pi2={0:0,1:2,2:2,3:2,4:3,5:1,6:2,7:2,8:0,9:0,10:1,11:2,12:2,13:0,14:1,15:1,16:0}
plt.subplot(1,2,1)
fp1.display(contents=pi1, fill=None, show_fig=False)
plt.subplot(1,2,2)
fp1.display(contents=pi2, fill=None, show_fig=False)
plt.show()
Create two instances of the DPAgent class, each using the environment created in Step 1.A, and each with gamma=1. One of the agents should be set to have policy pi1 and the other should have policy pi2. Run policy evaluation for both agents to evaluate the two policies.
Then display a 1x2 grid of subplots. Each subplot should show a display of the environment along with a policy. The first subplot should display pi1 and have cells shaded according to the value function for pi1. The second plot should be similar, but should use policy pi2 and its value function.
Note: You can copy the code for the subplots from 1.B, adjusting the arguments used for the fill and contents parameters.
Print the value of State 1(the initial state) under each policy.
You will now estimate the agent's success rate when following each policy. This will be accomplished by generating 10,000 episodes according to each policy and then calculating the proportion of episodes that where sucessful.
Fill in the blanks in order to accomplish the requested task. Then print the two messages shown below, with the blanks filled in with the appropriate success rates, rounded to 4 decimal places. Aside from filling in the blanks, do not change any code provided.
N =10000
goals1=0
goals2=0
np.random.seed(1)
for i in range(N):
ep1=______.generate_episode(policy=______)
ep2=______.generate_episode(policy=______)
if ep1.state == ep1.______:
goals1+=1
if ep2.state == ep2.______:
goals2+=1
sr1=______
sr2=______
print(f"Under policy 1, the agent's success rate was {______:.4f}.")
print(f"Under policy 2, the agent's success rate was {______:.4f}.")

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Automating Access Databases With Macros

Authors: Fish Davis

1st Edition

1797816349, 978-1797816340

More Books

Students also viewed these Databases questions