Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Problem 1 . ( 5 0 pt ) Given a Markov stationary policy , consider the policy evaluation problem to compute v . For example,

Problem 1.(50pt) Given a Markov stationary policy , consider the policy evaluation problem
to compute v. For example, we can apply the temporal difference (TD) learning algorithm
given by
vt+1(s)=vt(s)+t(s)*I{st)=s,
where t:=rt+vt(st+1)-vt(st) is known as TD error. Alternatively, we can apply the n-step
TD learning algorithm given by
vt+1(s)=vt(s)+(Gt(n)-vt(s))*I{st)=s,
where Gt(n):=rt+rt+1+dots+n-1rt+n-1+nvt(st+n) for n=1,2,dots. Note that t=
Gt(1)-vt(st).
The n-step TD algorithms for n use bootstrapping. Therefore, they use biased estimate
of v. On the other hand, as n, the n-step TD algorithm becomes a Monte Carlo method,
where we use an unbiased estimate of v. However, these approaches delay the update for n
stages and we update the value function estimate only for the current state.
As an intermediate step to address these challenges, we first introduce the -return algo-
rithm given by
vt+1(s)=vt(s)+(Gt-vt(s))*I{st)=s,
where given in[0,1], we define Gt:=(1-)n=1n-1Gt(n) taking a weighted average of
Gt(n)'s.
(a) By the definition of Gt(n), we can show that Gt(n)=rt+Gt+1(n-1). Derive an analogous
recursive relationship for Gt and Gt+1.
(b) Show that the term Gt-vt(s) in the -return update can be written as the sum of TD errors.
The TD algorithm, Monte Carlo method and -return algorithm looks forward to approx-
imate v. Alternatively, we can look backward via the eligibility trace method. TheTD()
algorithm is given by
zt(s)=zt-1(s)+I{s)=st,AAsinS
vt+1(s)=vt(s)+tzt(s),AAsinS,
where ztinR|S| is called the eligibility vector and the initial z-1(s)=0 for all s.
C
image text in transcribed

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Intelligent Databases Object Oriented Deductive Hypermedia Technologies

Authors: Kamran Parsaye, Mark Chignell, Setrag Khoshafian, Harry Wong

1st Edition

0471503452, 978-0471503453

More Books

Students also viewed these Databases questions