Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Problem 5. (40pt) As shown in the class, given an MDP problem, the mirror descent update with B(, ) = ses d (s)DKL(T( |

image text in transcribed

Problem 5. (40pt) As shown in the class, given an MDP problem, the mirror descent update with B(, ) = ses d (s)DKL(T( | s)||~(., s)) is given by Tk+1(s) = argmax (|S) EA(A) QT (s, a) ( (a | s) k(a | 8)) DKL(T( | 8)||Tk( | 8)) aA for all s = S, where d A(S) is the discounted state visitation distribution for the Markov stationary policy , EA(S) is the initial state distribution, and DKL (||) denotes the KL divergence. Show that this update is equivalent to Tk+1 (as) k (a | s) exp (akQk (s, a)) 'EA (a' | 8) exp (QT (8, a')) V(s,a). Note that this is a constrained optimization problem due to ( | s) A(A). You can apply Lagrange multiplier methods with the first-order optimality conditions to solve it as the KL divergence DKL (7||~) is a convex function of 7 when is fixed.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Making Hard Decisions with decision tools

Authors: Robert Clemen, Terence Reilly

3rd edition

538797576, 978-0538797573

More Books

Students also viewed these Mathematics questions

Question

In Problem, find the particular solution. y' = e x-3 y(0) = 2

Answered: 1 week ago

Question

2 . 1 2 zylab reading java.input

Answered: 1 week ago