Answered step by step
Verified Expert Solution
Question
1 Approved Answer
help with Q 1 import numpy as np import matplotlib.pyplot as plt import time from tqdm import tqdm from aitools.algs import DPAgent, MCAgent from aitools.envs
help with Q
import numpy as np
import matplotlib.pyplot as plt
import time
from tqdm import tqdm
from aitools.algs import DPAgent, MCAgent
from aitools.envs import FrozenPlatform
Create Environment
An instance of the FrozenPlatform environment has been provided for you in this cell. Call the display method of this isntance with fill'slip' and contents'slip' to display the environment with the slip probabilities for each state.
run cells below
pi:::::::::::::::::
pi:::::::::::::::::
pltsubplot
fpdisplaycontentspi fillNone, showfigFalse
pltsubplot
fpdisplaycontentspi fillNone, showfigFalse
pltshow
Create two instances of the DPAgent class, each using the environment created in Step A and each with gamma One of the agents should be set to have policy pi and the other should have policy pi Run policy evaluation for both agents to evaluate the two policies.
Then display a x grid of subplots. Each subplot should show a display of the environment along with a policy. The first subplot should display pi and have cells shaded according to the value function for pi The second plot should be similar, but should use policy pi and its value function.
Note: You can copy the code for the subplots from B adjusting the arguments used for the fill and contents parameters.
Print the value of State the initial state under each policy.
You will now estimate the agent's success rate when following each policy. This will be accomplished by generating episodes according to each policy and then calculating the proportion of episodes that where sucessful.
Fill in the blanks in order to accomplish the requested task. Then print the two messages shown below, with the blanks filled in with the appropriate success rates, rounded to decimal places. Aside from filling in the blanks, do not change any code provided.
N
goals
goals
nprandom.seed
for i in rangeN:
epgenerateepisodepolicy
epgenerateepisodepolicy
if epstate ep:
goals
if epstate ep:
goals
sr
sr
printfUnder policy the agent's success rate was :f
printfUnder policy the agent's success rate was :f
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started