Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

algorithm is given by z t ( s ) = z t - 1 ( s ) + I { s ) = s t

algorithm is given by
zt(s)=zt-1(s)+I{s)=st,AAsinS
vt+1(s)=vt(s)+tzt(s),AAsinS,
where ztinR|S| is called the eligibility vector and the initial z-1(s)=0 for all s.
(c) In the TD() algorithm, zt is computed recursively. Express zt only in terms of the states
visited in the past. This representation of the eligibility vector will show that eligibility
vectors combine the frequency heuristic and recency heuristic to address the credit assign-
ment problem. For the rewards received, the frequency heuristic assigns higher credit to
the frequently visited states while the recency heuristic assigns higher credit to the recently
visited states. The eligibility vector assigns higher credits to the frequently and recently
visited states.
Note that in the TD() algorithm, value function estimate for every state gets updated different
from the n-step TD algorithms, where only the estimate for the current state gets updated. If
a state has not been visited recently and frequently then the eligibility of that state (i.e., the
associated entry of the eligibility vector) will be close to zero. Therefore, the update via the
TD-error will take very small steps for such states.
Though -return is forward-looking while TD() is backward looking, they are equivalent
as you will show next for the finite horizon problem with horizon length T.
(d) Assume that the initial value function estimates are zero, i.e.,v0(s)=0 for all s. Then, the
recursive update in the -return algorithm yields that vT(s) can be written as
vT(s)=t=0T-1(Gt-vt(st))*I{st)=s.
Correspondingly, the recursive update in the TD() algorithm yields that vT(s) can be writ-
ten as
vT(s)=t=0T-1tzt(s).
Show that
t=0T-1tzt(s)=t=0T-1(Gt-vt(st))*I{st)=s,AAs.
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Concepts

Authors: David Kroenke

4th Edition

0136086535, 9780136086536

Students also viewed these Databases questions

Question

What are the pros and cons of a regional organization structure?

Answered: 1 week ago

Question

=+5. How they might use the product (usage effect).

Answered: 1 week ago