Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Problem 2 (The Value Functions) (40 pts): Follow the notations given in the lecture note, or alternatively from Chapter 3.5 in the book by (Sutton

image text in transcribed
Problem 2 (The Value Functions) (40 pts): Follow the notations given in the lecture note, or alternatively from Chapter 3.5 in the book by (Sutton and Barto), answer the following questions. (a) Give an equation for q in terms of the transition probability p(s,rs,a) and the value function v. (Hint: The action value function, q(s,a), is the expected return of taking action a at state s (and follow policy thereafter). The agent may receive a random (immediate) reward r and reach a random next state s, whose value is given by v(s).) (b) Give an equation for v in terms of q and . (Hint: Use the result in part (a), and the Bellman equation for v ) (c) Derive the Bellman equation for q. That is, express q(s,a) using q(s,a). Rearrange the expression such that the summations is next to each other, like the expression in Bellman equation for v. (Hint: Use the results from part (a) and part (b). Make sure to use the notation a as the action taken in the next state s.)

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Understanding And Managing Public Organizations

Authors: Hal G. Rainey

5th Edition

9781118583715

More Books

Students also viewed these General Management questions

Question

Does it avoid use of underlining?

Answered: 1 week ago