Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Problem 1 . ( 5 0 pt ) Given a Markov stationary policy pi , consider the policy evaluation problem to compute v ^

Problem 1.(50pt) Given a Markov stationary policy \pi , consider the policy evaluation problem to compute v^\pi . For example, we can apply the temporal difference (TD) learning algorithm given by v_t+1(s)=v_t(s)+\alpha \delta _t(s)_{s_t=s}, where \delta _t:=r_t+\gamma v_t(s_t+1)-v_t(s_t) is known as TD error. Alternatively, we can apply the n-step TD learning algorithm given by v_t+1(s)=v_t(s)+\alpha (G_t^(n)-v_t(s))_{s_t=s}, where G_t^(n):=r_t+\gamma r_t+1+...+\gamma ^n-1 r_t+n-1+\gamma ^n v_t^\pi (s_t+n) for n=1,2,... Note that \delta _t= G_t^(1)-v_t(s_t). The n-step TD algorithms for n<\infty use bootstrapping. Therefore, they use biased estimate of v^\pi . On the other hand, as n ->\infty , the n-step TD algorithm becomes a Monte Carlo method, where we use an unbiased estimate of v^\pi . However, these approaches delay the update for n stages and we update the value function estimate only for the current state. As an intermediate step to address these challenges, we first introduce the \lambda -return algorithm given by v_t+1(s)=v_t(s)+\alpha (G_t^\lambda -v_t(s))_{s_t=s}, where given \lambda in [0,1], we define G_t^\lambda :=(1-\lambda )_n=1^\infty \lambda ^n-1 G_t^(n) taking a weighted average of G_t^(n),s.(a) By the definition of G_t^(n), we can show that G_t^(n)=r_t+\gamma G_t+1^(n-1). Derive an analogous recursive relationship for G_t^\lambda and G_t+1^\lambda .(b) Show that the term G_t^\lambda -v_t(s) in the \lambda -return update can be written as the sum of TD errors. The TD algorithm, Monte Carlo method and \lambda -return algorithm looks forward to approximate v^\pi . Alternatively, we can look backward via the eligibility trace method. TheTD(\lambda )

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Beginning Microsoft SQL Server 2012 Programming

Authors: Paul Atkinson, Robert Vieira

1st Edition

1118102282, 9781118102282

More Books

Students also viewed these Databases questions

Question

Explain how to reward individual and team performance.

Answered: 1 week ago