Answered step by step
Verified Expert Solution
Question
1 Approved Answer
algorithm is given by z t ( s ) = z t - 1 ( s ) + I { s ) = s t
algorithm is given by
AAsinS
AAsinS,
where is called the eligibility vector and the initial for all
c In the algorithm, is computed recursively. Express only in terms of the states
visited in the past. This representation of the eligibility vector will show that eligibility
vectors combine the frequency heuristic and recency heuristic to address the credit assign
ment problem. For the rewards received, the frequency heuristic assigns higher credit to
the frequently visited states while the recency heuristic assigns higher credit to the recently
visited states. The eligibility vector assigns higher credits to the frequently and recently
visited states.
Note that in the algorithm, value function estimate for every state gets updated different
from the step TD algorithms, where only the estimate for the current state gets updated. If
a state has not been visited recently and frequently then the eligibility of that state ie the
associated entry of the eligibility vector will be close to zero. Therefore, the update via the
TDerror will take very small steps for such states.
Though return is forwardlooking while is backward looking, they are equivalent
as you will show next for the finite horizon problem with horizon length
d Assume that the initial value function estimates are zero, ie for all Then, the
recursive update in the return algorithm yields that can be written as
Correspondingly, the recursive update in the algorithm yields that can be writ
ten as
Show that
AAs.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started