Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Aug 31, 2024

Q . 4 . Figure ( a ) below shows a house with 6 rooms, where the rooms labeled 0 - 4 are internal rooms

. 4 .

Figure

(

)

below shows a house with

6

rooms, where the rooms labeled

0 - 4

are internal

rooms and

5

is the outside "room". Doors lead from each room to some others, as shown.

An Agent can be placed in any of the

6

rooms, which can be considered his starting State.

The objective is to take Actions to move from the starting room to room

' 5',

which can

also be called the Goal State. Transiting from one room

(

state

)

to another through a door

is considered an Action, that leads to a Reward. The Reward associated with each Action

is expressed in the Table

R,

which is like a Reward Matrix

-

each row represents a State

and each column an Action, and the value of that Action is the corresponding element.

All infeasible Actions

(

.

.

no doors exist to execute such actions

)

are shown as

- 1 .

You are to use the Q

-

Learning process to update the Q

-

matrix shown as Table Q

.

As you

know, the Q

-

Learning process proceeds through Episodes, and in each Episode a

sequence of states is followed which updates the Q

-

matrix at each step, till the Goal

(

Terminal

)

state is reached signaling the end of that Episode. The Q

-

Learning update

equation is the following:

Q^{n i t} (s_{i}, a_{t}) = (1 -) Q^{n i t} (s_{t}, a_{t}) + (R (s_{t}) + m a x_{a_{t + 4}} Q (s_{i + 1}, a_{t + 1}))

where all notations and symbols follow from what you have seen in your class.

Fig

(

)

: the house with six labelled rooms.

Tabie R: Rows are states, columns are

Actions. Values are Rewards.

Tabie Q: Rows are states, columns are

Actions. Elements are Q

-

Values.

You are to take

1,

0.8,

and then use the given Q

-

Matrix and R

-

Matrix to update

the

Q -

Matrix at each state in the following Episode

(

Sequence of states

)

2 = > 3 = > 1 = > 5 .

Please provide a handwritten solution if possible

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Hands On Database

Authors: Steve Conger

1st Edition

013610827X, 978-0136108276

More Books

Students also viewed these Databases questions

Question

★★★★★

The company reported the following balance sheet information: Total income tax expense for 2011 was $60,000. Compute the amount of cash paid for income taxes in2011. 2011 2010 Income taxes...

Answered: 1 week ago

Question

★★★★★

Explain the approach of the retail method for estimating inventories. What data must be accumulated to apply the retail method?

Answered: 1 week ago

Question

★★★★★

3. Most people regard worry as very unpleasant. How can worry be negatively reinforcing?

Answered: 1 week ago

Question

★★★★★

Sebastian and Whitley, plumbers, successfully bid $30,000 for the plumbing work on a new luxury home. Total direct labor cost on the job was $9,500, other direct costs were $2,500, and overhead is...

Answered: 1 week ago

Question

★★★★★

A bill is mailed to a client for services rendered. it will be paid in the following accounting period. Whcih of the following would be true as a result of mailing the bill to the client? A. there...

Answered: 1 week ago

Question

★★★★★

Imagine you are a data analyst at a car company, and your mission is to analyze people's interests in car brands. You have a group of car enthusiasts interested in three car brands: Volvo,...

Answered: 1 week ago

Question

★★★★★

A worker views leisure and income as "goods" and has an opportunity to work at an hourly wage of $16 per hour. a. The worker's opportunity set in a given 24-hour period is illustrated below.What are...

Answered: 1 week ago

Question

★★★★★

Question 26 [2 points} Using expansionary policies to combat a recession would 0 increase a budget deficit. 0 increase a budget surplus. 0 decrease discretionary spending. 0 increase federal revenue....

Answered: 1 week ago

Question

★★★★★

To analyze a complex negotiation (work, personal, or historical) To apply negotiation course concepts in your analysis. These objectives, while straightforward, are critical to your learning....

Answered: 1 week ago

Question

★★★★★

What Would You Do? Michael Michael is a student in your fifth-grade class. His father passed away in an automobile accident last year. Before his father's death Michael always did well in school and...

Answered: 1 week ago

Question

★★★★★

Exercise 8-23 (Algo) Preparing a balance sheet LO P1, P3, P4 Selected accounts from Gregor Company's adjusted trial balance for the year ended December 31 follow. Prepare a classified balance sheet....

Answered: 1 week ago

Question

★★★★★

Teamwork. Form groups of four students. Shake hands with each person in the group and then give either an oral or a written evaluation of the students handshake.Your instructor may direct each group...

Answered: 1 week ago

Question

★★★★★

Define team writing and explain how teams approach the three-step writing process. (Objective 6)

Answered: 1 week ago

Question

★★★★★

Briefly describe what a meeting leader and participant should do before, during, and after a meeting. (Objective 5)

Answered: 1 week ago

Previous Question Next Question