Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Problem 1 . ( 5 0 pt ) Given a Markov stationary policy pi , consider the policy evaluation problem to compute v

Problem 1.(50pt) Given a Markov stationary policy \pi , consider the policy evaluation problem to compute v
\pi
. For example, we can apply the temporal difference (TD) learning algorithm given by v
t+1
(s)=v
t
(s)+\alpha \delta
t
(s)I
{s
t
=s}
, where \delta
t
:=r
t
+\gamma v
t
(s
t+1
)v
t
(s
t
) is known as TD error. Alternatively, we can apply the n-step TD learning algorithm given by v
t+1
(s)=v
t
(s)+\alpha (G
t
(n)
v
t
(s))I
{s
t
=s}
, where G
t
(n)
:=r
t
+\gamma r
t+1
+...+\gamma
n1
r
t+n1
+\gamma
n
v
t
\pi
(s
t+n
) for n=1,2,... Note that \delta
t
= G
t
(1)
v
t
(s
t
). The n-step TD algorithms for n<\infty use bootstrapping. Therefore, they use biased estimate of v
\pi
. On the other hand, as n->\infty , the n-step TD algorithm becomes a Monte Carlo method, where we use an unbiased estimate of v
\pi
. However, these approaches delay the update for n stages and we update the value function estimate only for the current state. As an intermediate step to address these challenges, we first introduce the \lambda -return algorithm given by v
t+1
(s)=v
t
(s)+\alpha (G
t
\lambda
v
t
(s))I
{s
t
=s}
, where given \lambda in [0,1], we define G
t
\lambda
:=(1\lambda )
n=1
\infty
\lambda
n1
G
t
(n)
taking a weighted average of G
t
(n),s.
(a) By the definition of G
t
(n)
, we can show that G
t
(n)
=r
t
+\gamma G
t+1
(n1)
. Derive an analogous recursive relationship for G
t
\lambda
and G
t+1
\lambda
.(b) Show that the term G
t
\lambda
v
t
(s) in the \lambda -return update can be written as the sum of TD errors. The TD algorithm, Monte Carlo method and \lambda -return algorithm looks forward to approximate v
\pi
. Alternatively, we can look backward via the eligibility trace method. TheTD(\lambda )

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

DB2 11 The Database For Big Data And Analytics

Authors: Cristian Molaro, Surekha Parekh, Terry Purcell, Julian Stuhler

1st Edition

1583473858, 978-1583473856

More Books

Students also viewed these Databases questions

Question

1. Let a, b R, a Answered: 1 week ago

Answered: 1 week ago