Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Aug 01, 2024

The ID algorithm, Monte Carlo method and - return algorithm looks forward to approx - imate v . Alternatively, we can look backward via the

The ID algorithm, Monte Carlo method and

-

return algorithm looks forward to approx

-

imate

v .

Alternatively, we can look backward via the eligibility trace method. The

T D ()

algorithm is given by

z_{t} (s) = z_{t - 1} (s) + I_{{s)} = s_{t},

AAsinS

v_{t + 1} (s) = v_{t} (s) +_{t} z_{t} (s),

AAsinS,

where

z_{t} i n R^{| S |}

is called the eligibility vector and the initial

z_{- 1} (s) = 0

for all

s .

(

)

In the

T D ()

algorithm,

z_{t}

is computed recursively. Express

z_{t}

only in terms of the states

visited in the past. This representation of the eligibility vector will show that eligibility

vectors combine the frequency heuristic and recency heuristic to address the credit assign

-

ment problem. For the rewards received, the frequency heuristic assigns higher credit to

the frequently visited states while the recency heuristic assigns higher credit to the recently

visited states. The eligibility vector assigns higher credits to the frequently and recently

visited states.

Note that in the

T D ()

algorithm, value function estimate for every state gets updated different

from the

n -

step TD algorithms, where only the estimate for the current state gets updated. If

a state has not been visited recently and frequently then the eligibility of that state

(

.

.,

the

associated entry of the eligibility vector

)

will be close to zero. Therefore, the update via the

-

error will take very small steps for such states.

Though

-

return is forward

-

looking while

T D ()

is backward looking, they are equivalent

as you will show next for the finite horizon problem with horizon length

T .

(

)

Assume that the initial value function estimates are zero, i

.

., v_{0} (s) = 0

for all

s .

Then, the

recursive update in the

-

return algorithm yields that

v_{T} (s)

can be written as

v_{T} (s) =_{t = 0}^{T - 1} (G_{t}^{} - v_{t} (s_{t})) * I_{{s_{t})} = s .

Correspondingly, the recursive update in the

T D ()

algorithm yields that

v_{T} (s)

can be writ

-

ten as

v_{T} (s) =_{t = 0}^{T - 1}_{t} z_{t} (s) .

Show that

_{t = 0}^{T - 1}_{t} z_{t} (s) =_{t = 0}^{T - 1} (G_{t}^{} - v_{t} (s_{t})) * I_{{s_{t})} = s,

AAs.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Concepts

Authors: David Kroenke

4th Edition

0136086535, 9780136086536

More Books

Students also viewed these Databases questions

Question

★★★★★

Electronic copiers make copies by gluing black ink on paper, using static electricity. Heating and gluing the ink on the paper comprise the final stage of the copying process. The gluing power during...

Answered: 1 week ago

Question

★★★★★

Calculate the missing information in each of the following independent situations: Red Co $ 795 3,830 665 Blue Co. Supplies on hand, May 31,2016 Supplies purchased during the year Supplies on hand,...

Answered: 1 week ago

Question

★★★★★

=+background. To find out, she tested 24 subjects with normal hearing and measured the number of words perceived correctly in the presence of background noise. Here are the boxplots of the four...

Answered: 1 week ago

Question

★★★★★

Male Color Blindness When conducting research on color blindness in males, a researcher forms random groups with five males in each group. The random variable x is the number of males in the group...

Answered: 1 week ago

Question

★★★★★

What is the total present value of the following series of cash flows, discounted at 10 percent? End of year Cash flow 1 $1,000 2 1,000 3 -2,000 4 3,000

Answered: 1 week ago

Question

★★★★★

What is the payback period for the following set of cash flows? CF(0) = -2500 CF1 = 600 CF2 = 1300 CF4 = 800 CF5 = 600 What is the payback period for the following set of cash flows? CF(0) = -2500...

Answered: 1 week ago

Question

★★★★★

A hedger is short spot and long futures. Today, spot is 4.40 and the June future is 5.00. A bit later, spot goes to 5.00 and the June future goes to 5.10. What has happened to basis in this period?...

Answered: 1 week ago

Question

★★★★★

The EICAS CRTs (cathode ray tubes) are: Group of answer choices Identical Directly powered from the battery bus Cooled by AC-powered fans All the above Only b and c

Answered: 1 week ago

Question

★★★★★

A recruiter asks candidates for a customer service rep position, "Why are you interested in this role?" Which response ( s ) would hurt a candidate's chances of advancing to the next round? Select...

Answered: 1 week ago

Question

★★★★★

You are starting to draw up your plans for the program. You realize that employees are going to have a lot of questions about the change. You cannot get all of the questions handled in all - hands...

Answered: 1 week ago

Question

★★★★★

I need these charts filled in using the information given Instructions Analysis Ledger Trial Balance Greenwald and Mass developed a business and plan to begin by offering clinics for basic outdoor...

Answered: 1 week ago

Question

★★★★★

2. Should organizations establish policies that prevent dating a coworker? A supervisor? Explain your answer.

Answered: 1 week ago

Question

★★★★★

1. It has been said that authenticity is correlated with many aspects of psychological well-being such as improving self-esteem and coping skills. How does increased self-awareness help us grow in...

Answered: 1 week ago

Question

★★★★★

1. Is an office romance likely to affect the productivity of the two workers involved? Is it likely to affect the people who work with them?

Answered: 1 week ago

Previous Question Next Question