Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Aug 01, 2024

algorithm is given by z t ( s ) = z t - 1 ( s ) + I { s ) = s t

algorithm is given by

z_{t} (s) = z_{t - 1} (s) + I_{{s)} = s_{t},

AAsinS

v_{t + 1} (s) = v_{t} (s) +_{t} z_{t} (s),

AAsinS,

where

z_{t} i n R^{| S |}

is called the eligibility vector and the initial

z_{- 1} (s) = 0

for all

s .

(

)

In the

T D ()

algorithm,

z_{t}

is computed recursively. Express

z_{t}

only in terms of the states

visited in the past. This representation of the eligibility vector will show that eligibility

vectors combine the frequency heuristic and recency heuristic to address the credit assign

-

ment problem. For the rewards received, the frequency heuristic assigns higher credit to

the frequently visited states while the recency heuristic assigns higher credit to the recently

visited states. The eligibility vector assigns higher credits to the frequently and recently

visited states.

Note that in the

T D ()

algorithm, value function estimate for every state gets updated different

from the

n -

step TD algorithms, where only the estimate for the current state gets updated. If

a state has not been visited recently and frequently then the eligibility of that state

(

.

.,

the

associated entry of the eligibility vector

)

will be close to zero. Therefore, the update via the

-

error will take very small steps for such states.

Though

-

return is forward

-

looking while

T D ()

is backward looking, they are equivalent

as you will show next for the finite horizon problem with horizon length

T .

(

)

Assume that the initial value function estimates are zero, i

.

., v_{0} (s) = 0

for all

s .

Then, the

recursive update in the

-

return algorithm yields that

v_{T} (s)

can be written as

v_{T} (s) =_{t = 0}^{T - 1} (G_{t}^{} - v_{t} (s_{t})) * I_{{s_{t})} = s .

Correspondingly, the recursive update in the

T D ()

algorithm yields that

v_{T} (s)

can be writ

-

ten as

v_{T} (s) =_{t = 0}^{T - 1}_{t} z_{t} (s) .

Show that

_{t = 0}^{T - 1}_{t} z_{t} (s) =_{t = 0}^{T - 1} (G_{t}^{} - v_{t} (s_{t})) * I_{{s_{t})} = s,

AAs.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Concepts

Authors: David Kroenke

4th Edition

0136086535, 9780136086536

Students also viewed these Databases questions

Question

★★★★★

On May 1, 2010, Starmaker Machinery, Inc., purchased $60,000 of 10-year, 5% government bonds at 103, including the brokerage commission. The interest is received semiannually on May 1 and November 1....

Answered: 1 week ago

Question

★★★★★

What are the pros and cons of a regional organization structure?

Answered: 1 week ago

Question

★★★★★

g. Is the person a licensed psychologist? 3. Using an article from one of the exercise and sport psychology publications noted in the chapter, answer the following questions. a. What is the title of...

Answered: 1 week ago

Question

★★★★★

Branson Industries conducts operations in five major industries, A, B, C, D, and E. Financial data relevant to each industry for the year ending December 31, 2019, are as follows: Included in the...

Answered: 1 week ago

Question

★★★★★

Consider a European option on a non-dividend-paying stock, where the stock price is $88, the strike price is $79, the risk-free rate is 15%, the time to maturity is 34 weeks and the volatility is 11%...

Answered: 1 week ago

Question

★★★★★

Question 3 (b) Suppose the short-run production function for a firm producing a lipsticks is given by: Q = 64 - 0.412 + 2K, where Q is the number of books produced, L is the amount of labour used and...

Answered: 1 week ago

Question

★★★★★

An insurer has $2,500,000 in total earned premiums for the year. It experiences and pays out in one year $1,000,000 in total losses and $500,000 in underwriting expenses. As a result, under statutory...

Answered: 1 week ago

Question

★★★★★

Shakeia Duncan wants to know what home price she can afford. Her annual gross income is $ 4 6 , 8 0 0 . She owes $ 8 3 0 per month on other debts and expects her property taxes and homeowner's...

Answered: 1 week ago

Question

★★★★★

A lab orders a number of chemicals from the same supplier, with a lead time two weeks. The assistant manager of the lab must determine how much of one of these chemicals to order. Usage of the...

Answered: 1 week ago

Question

★★★★★

Spec, Inc.s stock is expected to generate a dividend and terminal value one year from now of P57.00.The stock has a beta of 1.3, the risk-free interest rate is 6 percent, and the expected return...

Answered: 1 week ago

Question

★★★★★

1. Honda is planning to introduce new electric car in 2 years. Describe your role in detail as Procurement and Purchasing Manager of Honda to ensure this product will be ready to launch in two years....

Answered: 1 week ago

Question

★★★★★

=+5. How they might use the product (usage effect).

Answered: 1 week ago

Question

★★★★★

=+2. What the audience's overall impressions were (general impressions).

Answered: 1 week ago

Question

★★★★★

=+4. Whether they responded to the ad in an emotional way (ad emotional effect).

Answered: 1 week ago

Previous Question Next Question