Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 26, 2024

help with Q 1 import numpy as np import matplotlib.pyplot as plt import time from tqdm import tqdm from aitools.algs import DPAgent, MCAgent from aitools.envs

help with Q

1

import numpy as np

import matplotlib.pyplot as plt

import time

from tqdm import tqdm

from aitools.algs import DPAgent, MCAgent

from aitools.envs import FrozenPlatform

Create Environment

An instance of the FrozenPlatform environment has been provided for you in this cell. Call the display

()

method of this isntance with fill

=

'slip' and contents

=

'slip' to display the environment with the slip probabilities for each state.

run cells below

1 = {0

0, 1

2, 2

2, 3

2, 4

3, 5

1, 6

1, 7

2, 8

0, 9

0, 10

1, 11

2, 12

2, 13

0, 14

1, 15

1, 16

0}

2 = {0

0, 1

2, 2

2, 3

2, 4

3, 5

1, 6

2, 7

2, 8

0, 9

0, 10

1, 11

2, 12

2, 13

0, 14

1, 15

1, 16

0}

plt

.

subplot

(1, 2, 1)

1 .

display

(

contents

=

1,

fill

=

None, show

_

fig

=

False

)

plt

.

subplot

(1, 2, 2)

1 .

display

(

contents

=

2,

fill

=

None, show

_

fig

=

False

)

plt

.

show

()

Create two instances of the DPAgent class, each using the environment created in Step

1 .

,

and each with gamma

= 1 .

One of the agents should be set to have policy pi

1

and the other should have policy pi

2 .

Run policy evaluation for both agents to evaluate the two policies.

Then display a

1

2

grid of subplots. Each subplot should show a display of the environment along with a policy. The first subplot should display pi

1

and have cells shaded according to the value function for pi

1 .

The second plot should be similar, but should use policy pi

2

and its value function.

Note: You can copy the code for the subplots from

1 .

,

adjusting the arguments used for the fill and contents parameters.

Print the value of State

1 (

the initial state

)

under each policy.

You will now estimate the agent's success rate when following each policy. This will be accomplished by generating

10, 000

episodes according to each policy and then calculating the proportion of episodes that where sucessful.

Fill in the blanks in order to accomplish the requested task. Then print the two messages shown below, with the blanks filled in with the appropriate success rates, rounded to

4

decimal places. Aside from filling in the blanks, do not change any code provided.

= 10000

goals

1 = 0

goals

2 = 0

.

random.seed

(1)

for i in range

(

)

1 =______.

generate

_

episode

(

policy

=______)

2 =______.

generate

_

episode

(

policy

=______)

if ep

1 .

state

= =

1 .______

goals

1 + = 1

if ep

2 .

state

= =

2 .______

goals

2 + = 1

1 =______

2 =______

(

"

Under policy

1,

the agent's success rate was

{______

. 4

} . ")

(

"

Under policy

2,

the agent's success rate was

{______

. 4

} . ")

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Automating Access Databases With Macros

Authors: Fish Davis

1st Edition

1797816349, 978-1797816340

More Books

Students also viewed these Databases questions

Question

★★★★★

Almost every day, it seems, we hear something on the news about the eating habits of U.S. adults and children. For instance, you may hear a piece saying kids under six years of age snack mostly on...

Answered: 1 week ago

Question

★★★★★

Explain how finding a gap in the marketplace can lead to new business ideas.

Answered: 1 week ago

Question

★★★★★

How could Meddevco now solve the problems created by not involving employees during the implementation of the HRIS?

Answered: 1 week ago

Question

★★★★★

Rihanna Company is considering purchasing new equipment for $450,000. It is expected that the equipment will produce net annual cash flows of $60,000 over its 10-year useful life. Annual depreciation...

Answered: 1 week ago

Question

★★★★★

help with Q 1 import numpy as np import matplotlib.pyplot as plt import time from tqdm import tqdm from aitools.algs import DPAgent, MCAgent from aitools.envs import FrozenPlatform Create Environment...

Answered: 1 week ago

Question

★★★★★

Jake Michols has discussed a $750,000 one-year loan (P) with a bank offering him an 11% annual rate, simple interest loan with a 20% compensating balance requirement. Interest is due at the end of...

Answered: 1 week ago

Question

★★★★★

Service providers (airlines, hotels, restaurants, visitor attractions) in the tourism industry, need to maintain a high capacity, in order to meet their overheads. The term "capacity" refers to the...

Answered: 1 week ago

Question

★★★★★

In a small town, two restaurants compete in the market for pizza. Each restaurant must decide whether to advertise or not. The potential monthly profits for these firms are shown below (in thousands...

Answered: 1 week ago

Question

★★★★★

Figure 13.4 shows the effect of an increase in the world interest rate on a small open economy with perfect capital mobility. We assumed there that the Net Capital Outow (NCO) was positive. For most...

Answered: 1 week ago

Question

★★★★★

1. Kelvin has $12,000 in outstanding student loans. Which of the following criteria should Kelvin fulfill to be eligible for the Public Service Loan Forgiveness (PSLF) program? He should be a...

Answered: 1 week ago

Question

★★★★★

Chapters are an important element of many nonprofit organizations. The relationship between chapters and their central organization is important. Which of the following is a common structural...

Answered: 1 week ago

Question

★★★★★

Explain why the following statements are false. a. The aggregate-demand curve slopes downward because it is the horizontal sum of the demand curves for individual goods. b. The long-run...

Answered: 1 week ago

Question

★★★★★

Suppose that the economy is currently in a recession. If policymakers take no action, how will the economy change over time? Explain in words and using an aggregate-demand/ aggregate-supply diagram.

Answered: 1 week ago

Question

★★★★★

Suppose the U.S. economy begins in long-run equilibrium. Concerns about global climate change cause the government to significantly restrict the production of electricity from fossil fuels. Because...

Answered: 1 week ago

Previous Question Next Question