[Solved] Let us define a gridworld MDP , depicted

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 26, 2024

Let us define a gridworld MDP , depicted in Figure 2 . The states are grid squares, identified by their row and column number (

Let us define a gridworld MDP

,

depicted in Figure

2 .

The states are grid squares, identified

by their row and column number

(

row first

) .

The agent always starts in state

(1, 1),

marked

with the letter S

.

There are two terminal goal states,

(2, 3)

with reward

+ 5

and

(1, 3)

with

reward

- 5 .

Rewards are

0

in non

-

terminal states.

(

The reward for a state is received as

the agent moves into the state.

)

The transition function is such that the intended agent

movement

(

,

Down, Left, or Right

)

happens with a probability of

0.8 .

With a probability

0.1

each, the agent ends up in one of the states perpendicular to the intended direction.

If a collision with a wall happens, the agent stays in the same state.

Figure

2

: Left: Gridworld MDP

,

Right: Transition function

(

)

Define the optimal policy for this gridworld MDP

.

(

)

Suppose the agent does not know the transition probabilities. What does

the agent need to do in order to learn the optimal policy?

(

)

The agent starts with the policy that always chooses to go right, and exe

-

cutes the following three trajectories:

1) (1, 1) - (1, 2) - (1, 3), 2,

and

3) (1, 1) - (2, 1) - (2, 2) - (2, 3) .

What are the First

-

Visit Monte Carlo estimates for

states

(1, 1)

and

(2, 2),

given these trajectories? Suppose

= 1 .

(

)

Using a learning rate of

= 0.1

and assuming initial values of

0,

what

updates does the TD

-

learning agent make after trials

1

and

2,

above? For this part,

suppose

= 0.9 .

please anwser all questions in detail. Thank you.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Professional SQL Server 2012 Internals And Troubleshooting

Authors: Christian Bolton, Justin Langford

1st Edition

ISBN: 1118177657, 9781118177655

More Books

Students also viewed these Databases questions

Question

★★★★★

The approval rating of the President is the percentage of people surveyed who believe the President is doing a good job. The histogram in Fig. 90 displays President Obamas daily approval ratings in...

Answered: 1 week ago

Question

★★★★★

Find the possible multiplicities of the terms of the types (a) D2; (b) P3/2; (c) F1.

Answered: 1 week ago

Question

★★★★★

Explain the purpose of pro forma financial statements.

Answered: 1 week ago

Question

★★★★★

Reuse Cookware, Inc., manufactures sets of heavy-duty pots. It has just completed production for August. At the beginning of August, its Work in process Inventory account showed direct materials...

Answered: 1 week ago

Question

★★★★★

Let us define a gridworld MDP , depicted in Figure 2 . The states are grid squares, identified by their row and column number ( row first ) . The agent always starts in state ( 1 , 1 ) , marked with...

Answered: 1 week ago

Question

★★★★★

Explain the nature of victimization in the United States using the 2019 National Crime Victimization Survey (NCVS). Specifically, Explain how rates ofvictimization, both for violent and property...

Answered: 1 week ago

Question

★★★★★

An organization would like to use a structured interview in which the applicants are asked to describe what they did in given situations in the past. What type of interview will the organization use?...

Answered: 1 week ago

Question

★★★★★

Paradise State University (PSU) is a medium-sized private university offering both undergraduate and graduate degrees. Students typically choose Paradise State because of its emphasis on high levels...

Answered: 1 week ago

Question

★★★★★

P P5 Explain what is happening such as imperfect competition, dominantstrategy monopoly, oligopoly game theory, etc. P4 A P3 PZ C P1 D P PS P4 P3 B P2 P1 ]1 Qz 3 ]1 Qz N MC ATC B AVC D a MR MC ATC B...

Answered: 1 week ago

Question

★★★★★

When is it appropriate to use a root cause analysis

Answered: 1 week ago

Question

★★★★★

.Calculate Enterprise value using the information below ( $ million) Market cap: $3million debt: $2million cash: $2million account receivable: $6Million 2.Which one is correct? ( ) higher PEG...

Answered: 1 week ago

Question

★★★★★

Assume that the banking system has total reserves of $100 billion. Assume also that required reserves are 10 percent of checking deposits and that banks hold no excess reserves and households hold no...

Answered: 1 week ago

Question

★★★★★

As shown in Figure 3, the overall labor-force participation rate of men declined between 1970 and 2000. At the same time, the labor-force participation rate of women increased sharply. This overall...

Answered: 1 week ago

Question

★★★★★

The Bureau of Labor Statistics announced that in February 2008, of all adult Americans, 145,993,000 were employed, 7,381,000 were unemployed, and 79,436,000 were not in the labor force. Use this...

Answered: 1 week ago

Previous Question Next Question