Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 22, 2024

4 - Assuming that all Q - values are initialized to 0 , what are the Q - values for the following state - action

4 -

Assuming that all

Q -

values are initialized to

0,

what are the

Q -

values for the following state

-

action pairs after running

[

tabular

]

-

learning for the first episode?

[

skip

/

disregard episodes

2

and

3] .

Use discount factor

= 0.8

and learning rate

= 0.6

(

,

Down

)

Q (B, U p)

Hint: Use the following equations and update

Q

values after each transition until the end of episode

1 .

Consider your new sample estimate

target

= R (s, a, s^{'}) + m a x_{a^{'}}

hat

(Q) (s^{'}, a^{'})

Incorporate the new estimate into a running average

hat

(Q) (s, a) l a r r (1 -) h a t (Q) (s, a) + () [

target

]

5 -

Repeat part

4

if you run SARSA

(

temporal difference

)

with the above experience sequence

(

again assume that all Q

-

values

are initialized to

0

and use only episode

1) ?

Use discount factor

= 0.8

and learning rate

= 0.6

Hint: Use the following equations and update

Q

values after each transition until the end of episode

1 .

Sample of hat

(Q)^{} (s, a)

,

target

= R (s, a, s^{'}) + h a t (Q)^{} (s^{'}, a^{'})

Update hat

(Q)^{} (s, a)

,

hat

(Q)^{} (s, a) l a r r (1 -) h a t (Q)^{} (s, a) +

target

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Theory And Application Bio Science And Bio Technology International Conferences Dta And Bsbt 2010 Held As Part Of The Future Generation Information Technology Conference Fgit 2010 Jeju Island Korea December 2010 Proceedings

Authors: Yanchun Zhang ,Alfredo Cuzzocrea ,Jianhua Ma ,Kyo-il Chung ,Tughrul Arslan ,Xiaofeng Song

2010th Edition

3642176216, 978-3642176210

More Books

Students also viewed these Databases questions

Question

★★★★★

Explain the assumptions behind the center-of-gravity method. How can the model be used in a service facility location?

Answered: 1 week ago

Question

★★★★★

List the procedures for testing priced inventory listings.

Answered: 1 week ago

Question

★★★★★

Which of the following is NOT one of the descriptive methods psychologists use to observe and describe behavior? a. A case study b. Naturalistic observation c. Correlational research d. A phone survey

Answered: 1 week ago

Question

★★★★★

The electric field at a distance of 0.145 m from the surface of a solid insulating sphere with radius 0.355 m is 1750 N/C. (a) Assuming the sphere's charge is uniformly distributed, what is the...

Answered: 1 week ago

Question

★★★★★

4 - Assuming that all Q - values are initialized to 0 , what are the Q - values for the following state - action pairs after running [ tabular ] Q - learning for the first episode? [ skip / disregard...

Answered: 1 week ago

Question

★★★★★

You would like to have $5 million dollars when you retire at age 65. You are 25 years old and you want to make your first savings payment immediately. You have not saved any money for your retirement...

Answered: 1 week ago

Question

★★★★★

Jackson Inc. has EPS of $5.625 with a return ratio of 60%. The annual growth rate in dividends is expected to be 6% and Jackson's shareholders require a return of 11%. The stock price is closest to:

Answered: 1 week ago

Question

★★★★★

Describe the advantages and disadvantages of a free expression of emotion in the management style

Answered: 1 week ago

Question

★★★★★

Integrity Company is relocating its facilities. The company estimates that it will take three trucks to move office contents. If the per truck rental charge is P1,000 plus 25 cents per mile, what is...

Answered: 1 week ago

Question

★★★★★

Required information Use the following information for the Problems below. (Static) [The following information applies to the questions displayed below.] Phoenix Company reports the following fixed...

Answered: 1 week ago

Question

★★★★★

KPMG- one of the major accounting firms- provides accounting, auditing and tax services. Majority of employees are Chartered Accountants and CPAs. The company has ambitious plans for expansion in...

Answered: 1 week ago

Question

★★★★★

10-18 Do you think Zagats decision to use a pay wall for its Web site was a mistake? Why or why not? Founded by Tim and Nina Zagat, the Zagat Survey has collected and published ratings of restaurants...

Answered: 1 week ago

Question

★★★★★

Explain the origins of human resources (HR) and their development over the last 100 years.

Answered: 1 week ago

Question

★★★★★

10-17 Why was Zagats content well suited for the Web and for the mobile digital platform? Founded by Tim and Nina Zagat, the Zagat Survey has collected and published ratings of restaurants by diners...

Answered: 1 week ago

Previous Question Next Question