Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 11, 2024

Semi - Gradient Update This problem presents a brief glimpse of the problems that can arise in off - policy learning with function approximation, through

Semi

-

Gradient Update

This problem presents a brief glimpse of the problems that can arise in off

-

policy learning with function approximation, through the concepts that have been introduced so far. If

you would like a more detailed discussion on these issues, you may refer to Chapter

11 .

Let us now apply semi

-

gradient TD learning from Chapter

9

with batch updates

(

Section

6.3)

to the following value

-

function approximation problem, based on a problem known as Baird's Counterexample:

In the above diagram, each circle is a state. The arrows represent some possible transitions between states.

The formulas shown in each state are for their values in terms of some parameters

w_{i}, i = 0, 1,

dots,

6

that comprise the value function approximator that we wish to learn.

Consider that we see each of the transitions shown above exactly once in a batch of data.

The reward for each transition is

0,

and the discount factor

= 0.95 .

We update the weights in the function approximation using semi

-

gradient

T D (0) .

Before the update, the weights are:

(1, 1, 1, 1, 1, 1, 5),

with

w_{6} = 5 .

With a learning rate

= 0.1,

what will be the weights after the update?

Consider the updates happening in a batch, with the individual updates summed up and applied to the weights after computing the update for all the six transitions. Enter your

answers to

3

decimal places.

0

points earned

w_{0}

(

no answer

)

6

0

points earned

7

0

points earned

8

0

points earned

(

no answer

)

(

no answer

)

Results for question

9 .

9

0

points earned

Incorrect answer:

(

no answer

)

Results for question

10 .

10

0

points earned

Incorrect answer:

(

no answer

)

Results for question

11 .

11

0

points earned

Incorrect answer:

(

no answer

)

Feedback

General Feedback

Using the update rule given in page

203

of the textbook in the algorithm

(

not the algorithm itself, as this is a batch update

),

the contribution to the update from the leftmost transition is

Similarly, the contributions from the other transitions from left to right are:

And for the bottom "loop" transition, it is

\ [[- 0.070, 0, 0, 0, 0, 0, - 0.035]^{

} \

Adding the above updates together and adding the total update to gives the updated weights.

Results for question

12 .

12

0

points earned

After this update, what will be the new value of the state whose value is calculated as

?

Enter your answer to

3

decimal places.

Incorrect answer:

(

no answer

)

Results for question

13 .

13

0

points earned

And what will be the value of the state whose value is calculated as

?

Enter your answer to

3

decimal places.

Incorrect answer:

(

no answer

)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Managing Your Information How To Design And Create A Textual Database On Your Microcomputer

Authors: Tenopir, Carol, Lundeen, Gerald

1st Edition

1555700233, 9781555700232

More Books

Students also viewed these Databases questions

Question

★★★★★

A projectile is fired up an incline (incline angle ) with an initial speed vi at an angle i with respect to the horizontal (i > ), as shown in Figure P4.50. (a) Show that the projectile travels a...

Answered: 1 week ago

Question

★★★★★

The interest rate risk of a noncallable bond is most likely to be positively related to the: a. Risk-free rate. b. Bonds coupon rate. c. Bonds time to maturity. d. Bonds yield to maturity.

Answered: 1 week ago

Question

★★★★★

LO 9-3 Identify various production processes and describe techniques that improve productivity, including computer-aided design and manufacturing, flexible manufacturing, lean manufacturing, mass...

Answered: 1 week ago

Question

★★★★★

1. What are the IT support needs for patient referrals to a specialist or another medical facility, and does this differ for routine versus provided by an MMC versus emergency care? 2. Define network...

Answered: 1 week ago

Question

★★★★★

Semi - Gradient Update This problem presents a brief glimpse of the problems that can arise in off - policy learning with function approximation, through the concepts that have been introduced so...

Answered: 1 week ago

Question

★★★★★

Moonbeam Company manufactures toasters. For the first 8 months of 2020, the company reported the following operating results while operating at 75% of plant capacity: Sales (358,400 units) $4,370,000...

Answered: 1 week ago

Question

★★★★★

What keeps the unit from blowing cold air into the house during defrost

Answered: 1 week ago

Question

★★★★★

Walmart's human resource management uses internal and external recruitment sources for various positions. With reference to the Case Study and theory, critically discuss the relevance of various...

Answered: 1 week ago

Question

★★★★★

5. Consider the vector field F(x, y, z) = = (a) Prove that F is not a gradient field. (b) Show that F curl(F) = 0. 2y2zi+4yzj+2y2k. (c) Find a function = (x) satisfying curl (pF) (c) Find a potential...

Answered: 1 week ago

Question

★★★★★

On January 1, 2025, Sanderson, Inc. acquired a machine for $1,040,000. The estimated useful life of the asset is five (5) years. Residual value at the end of five (5) years is estimated to be...

Answered: 1 week ago

Question

★★★★★

(10 points) Consider the following sensitivity analysis of EBIT and earnings per share (EPS) from the Hill Country case where $34,000 is the expected level of EBIT after the acquisition and $20,000...

Answered: 1 week ago

Question

★★★★★

Describe the linkages between HRM and strategy formulation. page 74

Answered: 1 week ago

Question

★★★★★

Identify how new technology, such as social networking, is influencing human resource management. page 45

Answered: 1 week ago

Question

★★★★★

Identify approaches to improving retention rates.

Answered: 1 week ago

Previous Question Next Question